######################################### House Rent Problem #################################################################

Ask a home buyer to describe their dream house, and they probably won't begin with the height of the basement ceiling or the proximity to an east-west railroad. But this playground competition's dataset proves that much more influences price negotiations than the number of bedrooms or a white-picket fence.
With 79 explanatory variables describing (almost) every aspect of residential homes in Ames, Iowa, this competition challenges you to predict the final price of each home.
Goal
It is your job to predict the sales price for each house. For each Id in the test set, you must predict the value of the SalePrice variable.
Metric
Submissions are evaluated on Root-Mean-Squared-Error (RMSE) between the logarithm of the predicted value and the logarithm of the observed sales price. (Taking logs means that errors in predicting expensive houses and cheap houses will affect the result equally.)
Reference Link : https://www.kaggle.com/c/house-prices-advanced-regression-techniques/overview/description
File descriptions
train.csv - the training set test.csv - the test set data_description.txt - full description of each column, originally prepared by Dean De Cock but lightly edited to match the column names used here sample_submission.csv - a benchmark submission from a linear regression on year and month of sale, lot square footage, and number of bedrooms
Data fields
Here's a brief version of what you'll find in the data description file.
SalePrice - the property's sale price in dollars. This is the target variable that you're trying to predict.
MSSubClass: The building class
MSZoning: The general zoning classification
LotFrontage: Linear feet of street connected to property
LotArea: Lot size in square feet
Street: Type of road access
Alley: Type of alley access
LotShape: General shape of property
LandContour: Flatness of the property
Utilities: Type of utilities available
LotConfig: Lot configuration
LandSlope: Slope of property
Neighborhood: Physical locations within Ames city limits
Condition1: Proximity to main road or railroad
Condition2: Proximity to main road or railroad (if a second is present)
BldgType: Type of dwelling
HouseStyle: Style of dwelling
OverallQual: Overall material and finish quality
OverallCond: Overall condition rating
YearBuilt: Original construction date
YearRemodAdd: Remodel date
RoofStyle: Type of roof
RoofMatl: Roof material
Exterior1st: Exterior covering on house
Exterior2nd: Exterior covering on house (if more than one material)
MasVnrType: Masonry veneer type
MasVnrArea: Masonry veneer area in square feet
ExterQual: Exterior material quality
ExterCond: Present condition of the material on the exterior
Foundation: Type of foundation
BsmtQual: Height of the basement
BsmtCond: General condition of the basement
BsmtExposure: Walkout or garden level basement walls
BsmtFinType1: Quality of basement finished area
BsmtFinSF1: Type 1 finished square feet
BsmtFinType2: Quality of second finished area (if present)
BsmtFinSF2: Type 2 finished square feet
BsmtUnfSF: Unfinished square feet of basement area
TotalBsmtSF: Total square feet of basement area
Heating: Type of heating
HeatingQC: Heating quality and condition
CentralAir: Central air conditioning
Electrical: Electrical system
1stFlrSF: First Floor square feet
2ndFlrSF: Second floor square feet
LowQualFinSF: Low quality finished square feet (all floors)
GrLivArea: Above grade (ground) living area square feet
BsmtFullBath: Basement full bathrooms
BsmtHalfBath: Basement half bathrooms
FullBath: Full bathrooms above grade
HalfBath: Half baths above grade
Bedroom: Number of bedrooms above basement level
Kitchen: Number of kitchens
KitchenQual: Kitchen quality
TotRmsAbvGrd: Total rooms above grade (does not include bathrooms)
Functional: Home functionality rating
Fireplaces: Number of fireplaces
FireplaceQu: Fireplace quality
GarageType: Garage location
GarageYrBlt: Year garage was built
GarageFinish: Interior finish of the garage
GarageCars: Size of garage in car capacity
GarageArea: Size of garage in square feet
GarageQual: Garage quality
GarageCond: Garage condition
PavedDrive: Paved driveway
WoodDeckSF: Wood deck area in square feet
OpenPorchSF: Open porch area in square feet
EnclosedPorch: Enclosed porch area in square feet
3SsnPorch: Three season porch area in square feet
ScreenPorch: Screen porch area in square feet
PoolArea: Pool area in square feet
PoolQC: Pool quality
Fence: Fence quality
MiscFeature: Miscellaneous feature not covered in other categories
MiscVal: $Value of miscellaneous feature
MoSold: Month Sold
YrSold: Year Sold
SaleType: Type of sale
SaleCondition: Condition of sale
## Import necessary libraries.
import numpy as np ## Numpy Library ( will use to convert data frame to array or creating array etc...).
import pandas as pd ## Pandas Library (will use to load data,create data frame...etc).
import os ## For connecting to machine to get path for reading/writing files.
from sklearn.model_selection import train_test_split ## For splitting data into train and validation.
import matplotlib.pyplot as plt ## For visualization.
import seaborn as sns ## For visualization.
## Get current working directory.
os.getcwd()
'D:\\Python\\Pratice'
## Set working directory
os.chdir("D:\\DataScience\\Pratice\\House_Rent_Price")
## Read the train data set.
data = pd.read_csv("train.csv",header='infer',sep=',')
## Read the test data set.
test_data = pd.read_csv("test.csv",header='infer',sep=',')
## Set how many rows and columns you want to display in jupyter notebook.
pd.options.display.max_columns = 200
pd.get_option('display.max_rows')
pd.set_option('display.max_rows',None)
## Check dimesnions of train data.
data.shape
(1460, 81)
## Check dimensions of test data.
test_data.shape
(1459, 80)
## Get first 5 records of train data.
data.head()
| Id | MSSubClass | MSZoning | LotFrontage | LotArea | Street | Alley | LotShape | LandContour | Utilities | LotConfig | LandSlope | Neighborhood | Condition1 | Condition2 | BldgType | HouseStyle | OverallQual | OverallCond | YearBuilt | YearRemodAdd | RoofStyle | RoofMatl | Exterior1st | Exterior2nd | MasVnrType | MasVnrArea | ExterQual | ExterCond | Foundation | BsmtQual | BsmtCond | BsmtExposure | BsmtFinType1 | BsmtFinSF1 | BsmtFinType2 | BsmtFinSF2 | BsmtUnfSF | TotalBsmtSF | Heating | HeatingQC | CentralAir | Electrical | 1stFlrSF | 2ndFlrSF | LowQualFinSF | GrLivArea | BsmtFullBath | BsmtHalfBath | FullBath | HalfBath | BedroomAbvGr | KitchenAbvGr | KitchenQual | TotRmsAbvGrd | Functional | Fireplaces | FireplaceQu | GarageType | GarageYrBlt | GarageFinish | GarageCars | GarageArea | GarageQual | GarageCond | PavedDrive | WoodDeckSF | OpenPorchSF | EnclosedPorch | 3SsnPorch | ScreenPorch | PoolArea | PoolQC | Fence | MiscFeature | MiscVal | MoSold | YrSold | SaleType | SaleCondition | SalePrice | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 1 | 60 | RL | 65.0 | 8450 | Pave | NaN | Reg | Lvl | AllPub | Inside | Gtl | CollgCr | Norm | Norm | 1Fam | 2Story | 7 | 5 | 2003 | 2003 | Gable | CompShg | VinylSd | VinylSd | BrkFace | 196.0 | Gd | TA | PConc | Gd | TA | No | GLQ | 706 | Unf | 0 | 150 | 856 | GasA | Ex | Y | SBrkr | 856 | 854 | 0 | 1710 | 1 | 0 | 2 | 1 | 3 | 1 | Gd | 8 | Typ | 0 | NaN | Attchd | 2003.0 | RFn | 2 | 548 | TA | TA | Y | 0 | 61 | 0 | 0 | 0 | 0 | NaN | NaN | NaN | 0 | 2 | 2008 | WD | Normal | 208500 |
| 1 | 2 | 20 | RL | 80.0 | 9600 | Pave | NaN | Reg | Lvl | AllPub | FR2 | Gtl | Veenker | Feedr | Norm | 1Fam | 1Story | 6 | 8 | 1976 | 1976 | Gable | CompShg | MetalSd | MetalSd | None | 0.0 | TA | TA | CBlock | Gd | TA | Gd | ALQ | 978 | Unf | 0 | 284 | 1262 | GasA | Ex | Y | SBrkr | 1262 | 0 | 0 | 1262 | 0 | 1 | 2 | 0 | 3 | 1 | TA | 6 | Typ | 1 | TA | Attchd | 1976.0 | RFn | 2 | 460 | TA | TA | Y | 298 | 0 | 0 | 0 | 0 | 0 | NaN | NaN | NaN | 0 | 5 | 2007 | WD | Normal | 181500 |
| 2 | 3 | 60 | RL | 68.0 | 11250 | Pave | NaN | IR1 | Lvl | AllPub | Inside | Gtl | CollgCr | Norm | Norm | 1Fam | 2Story | 7 | 5 | 2001 | 2002 | Gable | CompShg | VinylSd | VinylSd | BrkFace | 162.0 | Gd | TA | PConc | Gd | TA | Mn | GLQ | 486 | Unf | 0 | 434 | 920 | GasA | Ex | Y | SBrkr | 920 | 866 | 0 | 1786 | 1 | 0 | 2 | 1 | 3 | 1 | Gd | 6 | Typ | 1 | TA | Attchd | 2001.0 | RFn | 2 | 608 | TA | TA | Y | 0 | 42 | 0 | 0 | 0 | 0 | NaN | NaN | NaN | 0 | 9 | 2008 | WD | Normal | 223500 |
| 3 | 4 | 70 | RL | 60.0 | 9550 | Pave | NaN | IR1 | Lvl | AllPub | Corner | Gtl | Crawfor | Norm | Norm | 1Fam | 2Story | 7 | 5 | 1915 | 1970 | Gable | CompShg | Wd Sdng | Wd Shng | None | 0.0 | TA | TA | BrkTil | TA | Gd | No | ALQ | 216 | Unf | 0 | 540 | 756 | GasA | Gd | Y | SBrkr | 961 | 756 | 0 | 1717 | 1 | 0 | 1 | 0 | 3 | 1 | Gd | 7 | Typ | 1 | Gd | Detchd | 1998.0 | Unf | 3 | 642 | TA | TA | Y | 0 | 35 | 272 | 0 | 0 | 0 | NaN | NaN | NaN | 0 | 2 | 2006 | WD | Abnorml | 140000 |
| 4 | 5 | 60 | RL | 84.0 | 14260 | Pave | NaN | IR1 | Lvl | AllPub | FR2 | Gtl | NoRidge | Norm | Norm | 1Fam | 2Story | 8 | 5 | 2000 | 2000 | Gable | CompShg | VinylSd | VinylSd | BrkFace | 350.0 | Gd | TA | PConc | Gd | TA | Av | GLQ | 655 | Unf | 0 | 490 | 1145 | GasA | Ex | Y | SBrkr | 1145 | 1053 | 0 | 2198 | 1 | 0 | 2 | 1 | 4 | 1 | Gd | 9 | Typ | 1 | TA | Attchd | 2000.0 | RFn | 3 | 836 | TA | TA | Y | 192 | 84 | 0 | 0 | 0 | 0 | NaN | NaN | NaN | 0 | 12 | 2008 | WD | Normal | 250000 |
## Get first 5 records of train data.
test_data.head()
| Id | MSSubClass | MSZoning | LotFrontage | LotArea | Street | Alley | LotShape | LandContour | Utilities | LotConfig | LandSlope | Neighborhood | Condition1 | Condition2 | BldgType | HouseStyle | OverallQual | OverallCond | YearBuilt | YearRemodAdd | RoofStyle | RoofMatl | Exterior1st | Exterior2nd | MasVnrType | MasVnrArea | ExterQual | ExterCond | Foundation | BsmtQual | BsmtCond | BsmtExposure | BsmtFinType1 | BsmtFinSF1 | BsmtFinType2 | BsmtFinSF2 | BsmtUnfSF | TotalBsmtSF | Heating | HeatingQC | CentralAir | Electrical | 1stFlrSF | 2ndFlrSF | LowQualFinSF | GrLivArea | BsmtFullBath | BsmtHalfBath | FullBath | HalfBath | BedroomAbvGr | KitchenAbvGr | KitchenQual | TotRmsAbvGrd | Functional | Fireplaces | FireplaceQu | GarageType | GarageYrBlt | GarageFinish | GarageCars | GarageArea | GarageQual | GarageCond | PavedDrive | WoodDeckSF | OpenPorchSF | EnclosedPorch | 3SsnPorch | ScreenPorch | PoolArea | PoolQC | Fence | MiscFeature | MiscVal | MoSold | YrSold | SaleType | SaleCondition | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 1461 | 20 | RH | 80.0 | 11622 | Pave | NaN | Reg | Lvl | AllPub | Inside | Gtl | NAmes | Feedr | Norm | 1Fam | 1Story | 5 | 6 | 1961 | 1961 | Gable | CompShg | VinylSd | VinylSd | None | 0.0 | TA | TA | CBlock | TA | TA | No | Rec | 468.0 | LwQ | 144.0 | 270.0 | 882.0 | GasA | TA | Y | SBrkr | 896 | 0 | 0 | 896 | 0.0 | 0.0 | 1 | 0 | 2 | 1 | TA | 5 | Typ | 0 | NaN | Attchd | 1961.0 | Unf | 1.0 | 730.0 | TA | TA | Y | 140 | 0 | 0 | 0 | 120 | 0 | NaN | MnPrv | NaN | 0 | 6 | 2010 | WD | Normal |
| 1 | 1462 | 20 | RL | 81.0 | 14267 | Pave | NaN | IR1 | Lvl | AllPub | Corner | Gtl | NAmes | Norm | Norm | 1Fam | 1Story | 6 | 6 | 1958 | 1958 | Hip | CompShg | Wd Sdng | Wd Sdng | BrkFace | 108.0 | TA | TA | CBlock | TA | TA | No | ALQ | 923.0 | Unf | 0.0 | 406.0 | 1329.0 | GasA | TA | Y | SBrkr | 1329 | 0 | 0 | 1329 | 0.0 | 0.0 | 1 | 1 | 3 | 1 | Gd | 6 | Typ | 0 | NaN | Attchd | 1958.0 | Unf | 1.0 | 312.0 | TA | TA | Y | 393 | 36 | 0 | 0 | 0 | 0 | NaN | NaN | Gar2 | 12500 | 6 | 2010 | WD | Normal |
| 2 | 1463 | 60 | RL | 74.0 | 13830 | Pave | NaN | IR1 | Lvl | AllPub | Inside | Gtl | Gilbert | Norm | Norm | 1Fam | 2Story | 5 | 5 | 1997 | 1998 | Gable | CompShg | VinylSd | VinylSd | None | 0.0 | TA | TA | PConc | Gd | TA | No | GLQ | 791.0 | Unf | 0.0 | 137.0 | 928.0 | GasA | Gd | Y | SBrkr | 928 | 701 | 0 | 1629 | 0.0 | 0.0 | 2 | 1 | 3 | 1 | TA | 6 | Typ | 1 | TA | Attchd | 1997.0 | Fin | 2.0 | 482.0 | TA | TA | Y | 212 | 34 | 0 | 0 | 0 | 0 | NaN | MnPrv | NaN | 0 | 3 | 2010 | WD | Normal |
| 3 | 1464 | 60 | RL | 78.0 | 9978 | Pave | NaN | IR1 | Lvl | AllPub | Inside | Gtl | Gilbert | Norm | Norm | 1Fam | 2Story | 6 | 6 | 1998 | 1998 | Gable | CompShg | VinylSd | VinylSd | BrkFace | 20.0 | TA | TA | PConc | TA | TA | No | GLQ | 602.0 | Unf | 0.0 | 324.0 | 926.0 | GasA | Ex | Y | SBrkr | 926 | 678 | 0 | 1604 | 0.0 | 0.0 | 2 | 1 | 3 | 1 | Gd | 7 | Typ | 1 | Gd | Attchd | 1998.0 | Fin | 2.0 | 470.0 | TA | TA | Y | 360 | 36 | 0 | 0 | 0 | 0 | NaN | NaN | NaN | 0 | 6 | 2010 | WD | Normal |
| 4 | 1465 | 120 | RL | 43.0 | 5005 | Pave | NaN | IR1 | HLS | AllPub | Inside | Gtl | StoneBr | Norm | Norm | TwnhsE | 1Story | 8 | 5 | 1992 | 1992 | Gable | CompShg | HdBoard | HdBoard | None | 0.0 | Gd | TA | PConc | Gd | TA | No | ALQ | 263.0 | Unf | 0.0 | 1017.0 | 1280.0 | GasA | Ex | Y | SBrkr | 1280 | 0 | 0 | 1280 | 0.0 | 0.0 | 2 | 0 | 2 | 1 | Gd | 5 | Typ | 0 | NaN | Attchd | 1992.0 | RFn | 2.0 | 506.0 | TA | TA | Y | 0 | 82 | 0 | 0 | 144 | 0 | NaN | NaN | NaN | 0 | 1 | 2010 | WD | Normal |
## Check summary statistics of train data.
data.describe(include='all')
| Id | MSSubClass | MSZoning | LotFrontage | LotArea | Street | Alley | LotShape | LandContour | Utilities | LotConfig | LandSlope | Neighborhood | Condition1 | Condition2 | BldgType | HouseStyle | OverallQual | OverallCond | YearBuilt | YearRemodAdd | RoofStyle | RoofMatl | Exterior1st | Exterior2nd | MasVnrType | MasVnrArea | ExterQual | ExterCond | Foundation | BsmtQual | BsmtCond | BsmtExposure | BsmtFinType1 | BsmtFinSF1 | BsmtFinType2 | BsmtFinSF2 | BsmtUnfSF | TotalBsmtSF | Heating | HeatingQC | CentralAir | Electrical | 1stFlrSF | 2ndFlrSF | LowQualFinSF | GrLivArea | BsmtFullBath | BsmtHalfBath | FullBath | HalfBath | BedroomAbvGr | KitchenAbvGr | KitchenQual | TotRmsAbvGrd | Functional | Fireplaces | FireplaceQu | GarageType | GarageYrBlt | GarageFinish | GarageCars | GarageArea | GarageQual | GarageCond | PavedDrive | WoodDeckSF | OpenPorchSF | EnclosedPorch | 3SsnPorch | ScreenPorch | PoolArea | PoolQC | Fence | MiscFeature | MiscVal | MoSold | YrSold | SaleType | SaleCondition | SalePrice | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| count | 1460.000000 | 1460.000000 | 1460 | 1201.000000 | 1460.000000 | 1460 | 91 | 1460 | 1460 | 1460 | 1460 | 1460 | 1460 | 1460 | 1460 | 1460 | 1460 | 1460.000000 | 1460.000000 | 1460.000000 | 1460.000000 | 1460 | 1460 | 1460 | 1460 | 1452 | 1452.000000 | 1460 | 1460 | 1460 | 1423 | 1423 | 1422 | 1423 | 1460.000000 | 1422 | 1460.000000 | 1460.000000 | 1460.000000 | 1460 | 1460 | 1460 | 1459 | 1460.000000 | 1460.000000 | 1460.000000 | 1460.000000 | 1460.000000 | 1460.000000 | 1460.000000 | 1460.000000 | 1460.000000 | 1460.000000 | 1460 | 1460.000000 | 1460 | 1460.000000 | 770 | 1379 | 1379.000000 | 1379 | 1460.000000 | 1460.000000 | 1379 | 1379 | 1460 | 1460.000000 | 1460.000000 | 1460.000000 | 1460.000000 | 1460.000000 | 1460.000000 | 7 | 281 | 54 | 1460.000000 | 1460.000000 | 1460.000000 | 1460 | 1460 | 1460.000000 |
| unique | NaN | NaN | 5 | NaN | NaN | 2 | 2 | 4 | 4 | 2 | 5 | 3 | 25 | 9 | 8 | 5 | 8 | NaN | NaN | NaN | NaN | 6 | 8 | 15 | 16 | 4 | NaN | 4 | 5 | 6 | 4 | 4 | 4 | 6 | NaN | 6 | NaN | NaN | NaN | 6 | 5 | 2 | 5 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 4 | NaN | 7 | NaN | 5 | 6 | NaN | 3 | NaN | NaN | 5 | 5 | 3 | NaN | NaN | NaN | NaN | NaN | NaN | 3 | 4 | 4 | NaN | NaN | NaN | 9 | 6 | NaN |
| top | NaN | NaN | RL | NaN | NaN | Pave | Grvl | Reg | Lvl | AllPub | Inside | Gtl | NAmes | Norm | Norm | 1Fam | 1Story | NaN | NaN | NaN | NaN | Gable | CompShg | VinylSd | VinylSd | None | NaN | TA | TA | PConc | TA | TA | No | Unf | NaN | Unf | NaN | NaN | NaN | GasA | Ex | Y | SBrkr | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | TA | NaN | Typ | NaN | Gd | Attchd | NaN | Unf | NaN | NaN | TA | TA | Y | NaN | NaN | NaN | NaN | NaN | NaN | Gd | MnPrv | Shed | NaN | NaN | NaN | WD | Normal | NaN |
| freq | NaN | NaN | 1151 | NaN | NaN | 1454 | 50 | 925 | 1311 | 1459 | 1052 | 1382 | 225 | 1260 | 1445 | 1220 | 726 | NaN | NaN | NaN | NaN | 1141 | 1434 | 515 | 504 | 864 | NaN | 906 | 1282 | 647 | 649 | 1311 | 953 | 430 | NaN | 1256 | NaN | NaN | NaN | 1428 | 741 | 1365 | 1334 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 735 | NaN | 1360 | NaN | 380 | 870 | NaN | 605 | NaN | NaN | 1311 | 1326 | 1340 | NaN | NaN | NaN | NaN | NaN | NaN | 3 | 157 | 49 | NaN | NaN | NaN | 1267 | 1198 | NaN |
| mean | 730.500000 | 56.897260 | NaN | 70.049958 | 10516.828082 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 6.099315 | 5.575342 | 1971.267808 | 1984.865753 | NaN | NaN | NaN | NaN | NaN | 103.685262 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 443.639726 | NaN | 46.549315 | 567.240411 | 1057.429452 | NaN | NaN | NaN | NaN | 1162.626712 | 346.992466 | 5.844521 | 1515.463699 | 0.425342 | 0.057534 | 1.565068 | 0.382877 | 2.866438 | 1.046575 | NaN | 6.517808 | NaN | 0.613014 | NaN | NaN | 1978.506164 | NaN | 1.767123 | 472.980137 | NaN | NaN | NaN | 94.244521 | 46.660274 | 21.954110 | 3.409589 | 15.060959 | 2.758904 | NaN | NaN | NaN | 43.489041 | 6.321918 | 2007.815753 | NaN | NaN | 180921.195890 |
| std | 421.610009 | 42.300571 | NaN | 24.284752 | 9981.264932 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 1.382997 | 1.112799 | 30.202904 | 20.645407 | NaN | NaN | NaN | NaN | NaN | 181.066207 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 456.098091 | NaN | 161.319273 | 441.866955 | 438.705324 | NaN | NaN | NaN | NaN | 386.587738 | 436.528436 | 48.623081 | 525.480383 | 0.518911 | 0.238753 | 0.550916 | 0.502885 | 0.815778 | 0.220338 | NaN | 1.625393 | NaN | 0.644666 | NaN | NaN | 24.689725 | NaN | 0.747315 | 213.804841 | NaN | NaN | NaN | 125.338794 | 66.256028 | 61.119149 | 29.317331 | 55.757415 | 40.177307 | NaN | NaN | NaN | 496.123024 | 2.703626 | 1.328095 | NaN | NaN | 79442.502883 |
| min | 1.000000 | 20.000000 | NaN | 21.000000 | 1300.000000 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 1.000000 | 1.000000 | 1872.000000 | 1950.000000 | NaN | NaN | NaN | NaN | NaN | 0.000000 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 0.000000 | NaN | 0.000000 | 0.000000 | 0.000000 | NaN | NaN | NaN | NaN | 334.000000 | 0.000000 | 0.000000 | 334.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | NaN | 2.000000 | NaN | 0.000000 | NaN | NaN | 1900.000000 | NaN | 0.000000 | 0.000000 | NaN | NaN | NaN | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | NaN | NaN | NaN | 0.000000 | 1.000000 | 2006.000000 | NaN | NaN | 34900.000000 |
| 25% | 365.750000 | 20.000000 | NaN | 59.000000 | 7553.500000 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 5.000000 | 5.000000 | 1954.000000 | 1967.000000 | NaN | NaN | NaN | NaN | NaN | 0.000000 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 0.000000 | NaN | 0.000000 | 223.000000 | 795.750000 | NaN | NaN | NaN | NaN | 882.000000 | 0.000000 | 0.000000 | 1129.500000 | 0.000000 | 0.000000 | 1.000000 | 0.000000 | 2.000000 | 1.000000 | NaN | 5.000000 | NaN | 0.000000 | NaN | NaN | 1961.000000 | NaN | 1.000000 | 334.500000 | NaN | NaN | NaN | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | NaN | NaN | NaN | 0.000000 | 5.000000 | 2007.000000 | NaN | NaN | 129975.000000 |
| 50% | 730.500000 | 50.000000 | NaN | 69.000000 | 9478.500000 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 6.000000 | 5.000000 | 1973.000000 | 1994.000000 | NaN | NaN | NaN | NaN | NaN | 0.000000 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 383.500000 | NaN | 0.000000 | 477.500000 | 991.500000 | NaN | NaN | NaN | NaN | 1087.000000 | 0.000000 | 0.000000 | 1464.000000 | 0.000000 | 0.000000 | 2.000000 | 0.000000 | 3.000000 | 1.000000 | NaN | 6.000000 | NaN | 1.000000 | NaN | NaN | 1980.000000 | NaN | 2.000000 | 480.000000 | NaN | NaN | NaN | 0.000000 | 25.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | NaN | NaN | NaN | 0.000000 | 6.000000 | 2008.000000 | NaN | NaN | 163000.000000 |
| 75% | 1095.250000 | 70.000000 | NaN | 80.000000 | 11601.500000 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 7.000000 | 6.000000 | 2000.000000 | 2004.000000 | NaN | NaN | NaN | NaN | NaN | 166.000000 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 712.250000 | NaN | 0.000000 | 808.000000 | 1298.250000 | NaN | NaN | NaN | NaN | 1391.250000 | 728.000000 | 0.000000 | 1776.750000 | 1.000000 | 0.000000 | 2.000000 | 1.000000 | 3.000000 | 1.000000 | NaN | 7.000000 | NaN | 1.000000 | NaN | NaN | 2002.000000 | NaN | 2.000000 | 576.000000 | NaN | NaN | NaN | 168.000000 | 68.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | NaN | NaN | NaN | 0.000000 | 8.000000 | 2009.000000 | NaN | NaN | 214000.000000 |
| max | 1460.000000 | 190.000000 | NaN | 313.000000 | 215245.000000 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 10.000000 | 9.000000 | 2010.000000 | 2010.000000 | NaN | NaN | NaN | NaN | NaN | 1600.000000 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 5644.000000 | NaN | 1474.000000 | 2336.000000 | 6110.000000 | NaN | NaN | NaN | NaN | 4692.000000 | 2065.000000 | 572.000000 | 5642.000000 | 3.000000 | 2.000000 | 3.000000 | 2.000000 | 8.000000 | 3.000000 | NaN | 14.000000 | NaN | 3.000000 | NaN | NaN | 2010.000000 | NaN | 4.000000 | 1418.000000 | NaN | NaN | NaN | 857.000000 | 547.000000 | 552.000000 | 508.000000 | 480.000000 | 738.000000 | NaN | NaN | NaN | 15500.000000 | 12.000000 | 2010.000000 | NaN | NaN | 755000.000000 |
## Check summary statistics of test data.
test_data.describe(include='all')
| Id | MSSubClass | MSZoning | LotFrontage | LotArea | Street | Alley | LotShape | LandContour | Utilities | LotConfig | LandSlope | Neighborhood | Condition1 | Condition2 | BldgType | HouseStyle | OverallQual | OverallCond | YearBuilt | YearRemodAdd | RoofStyle | RoofMatl | Exterior1st | Exterior2nd | MasVnrType | MasVnrArea | ExterQual | ExterCond | Foundation | BsmtQual | BsmtCond | BsmtExposure | BsmtFinType1 | BsmtFinSF1 | BsmtFinType2 | BsmtFinSF2 | BsmtUnfSF | TotalBsmtSF | Heating | HeatingQC | CentralAir | Electrical | 1stFlrSF | 2ndFlrSF | LowQualFinSF | GrLivArea | BsmtFullBath | BsmtHalfBath | FullBath | HalfBath | BedroomAbvGr | KitchenAbvGr | KitchenQual | TotRmsAbvGrd | Functional | Fireplaces | FireplaceQu | GarageType | GarageYrBlt | GarageFinish | GarageCars | GarageArea | GarageQual | GarageCond | PavedDrive | WoodDeckSF | OpenPorchSF | EnclosedPorch | 3SsnPorch | ScreenPorch | PoolArea | PoolQC | Fence | MiscFeature | MiscVal | MoSold | YrSold | SaleType | SaleCondition | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| count | 1459.000000 | 1459.000000 | 1455 | 1232.000000 | 1459.000000 | 1459 | 107 | 1459 | 1459 | 1457 | 1459 | 1459 | 1459 | 1459 | 1459 | 1459 | 1459 | 1459.000000 | 1459.000000 | 1459.000000 | 1459.000000 | 1459 | 1459 | 1458 | 1458 | 1443 | 1444.000000 | 1459 | 1459 | 1459 | 1415 | 1414 | 1415 | 1417 | 1458.000000 | 1417 | 1458.000000 | 1458.000000 | 1458.000000 | 1459 | 1459 | 1459 | 1459 | 1459.000000 | 1459.000000 | 1459.000000 | 1459.000000 | 1457.000000 | 1457.000000 | 1459.000000 | 1459.000000 | 1459.000000 | 1459.000000 | 1458 | 1459.000000 | 1457 | 1459.00000 | 729 | 1383 | 1381.000000 | 1381 | 1458.000000 | 1458.000000 | 1381 | 1381 | 1459 | 1459.000000 | 1459.000000 | 1459.000000 | 1459.000000 | 1459.000000 | 1459.000000 | 3 | 290 | 51 | 1459.000000 | 1459.000000 | 1459.000000 | 1458 | 1459 |
| unique | NaN | NaN | 5 | NaN | NaN | 2 | 2 | 4 | 4 | 1 | 5 | 3 | 25 | 9 | 5 | 5 | 7 | NaN | NaN | NaN | NaN | 6 | 4 | 13 | 15 | 4 | NaN | 4 | 5 | 6 | 4 | 4 | 4 | 6 | NaN | 6 | NaN | NaN | NaN | 4 | 5 | 2 | 4 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 4 | NaN | 7 | NaN | 5 | 6 | NaN | 3 | NaN | NaN | 4 | 5 | 3 | NaN | NaN | NaN | NaN | NaN | NaN | 2 | 4 | 3 | NaN | NaN | NaN | 9 | 6 |
| top | NaN | NaN | RL | NaN | NaN | Pave | Grvl | Reg | Lvl | AllPub | Inside | Gtl | NAmes | Norm | Norm | 1Fam | 1Story | NaN | NaN | NaN | NaN | Gable | CompShg | VinylSd | VinylSd | None | NaN | TA | TA | PConc | TA | TA | No | GLQ | NaN | Unf | NaN | NaN | NaN | GasA | Ex | Y | SBrkr | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | TA | NaN | Typ | NaN | Gd | Attchd | NaN | Unf | NaN | NaN | TA | TA | Y | NaN | NaN | NaN | NaN | NaN | NaN | Ex | MnPrv | Shed | NaN | NaN | NaN | WD | Normal |
| freq | NaN | NaN | 1114 | NaN | NaN | 1453 | 70 | 934 | 1311 | 1457 | 1081 | 1396 | 218 | 1251 | 1444 | 1205 | 745 | NaN | NaN | NaN | NaN | 1169 | 1442 | 510 | 510 | 878 | NaN | 892 | 1256 | 661 | 634 | 1295 | 951 | 431 | NaN | 1237 | NaN | NaN | NaN | 1446 | 752 | 1358 | 1337 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 757 | NaN | 1357 | NaN | 364 | 853 | NaN | 625 | NaN | NaN | 1293 | 1328 | 1301 | NaN | NaN | NaN | NaN | NaN | NaN | 2 | 172 | 46 | NaN | NaN | NaN | 1258 | 1204 |
| mean | 2190.000000 | 57.378341 | NaN | 68.580357 | 9819.161069 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 6.078821 | 5.553804 | 1971.357779 | 1983.662783 | NaN | NaN | NaN | NaN | NaN | 100.709141 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 439.203704 | NaN | 52.619342 | 554.294925 | 1046.117970 | NaN | NaN | NaN | NaN | 1156.534613 | 325.967786 | 3.543523 | 1486.045922 | 0.434454 | 0.065202 | 1.570939 | 0.377656 | 2.854010 | 1.042495 | NaN | 6.385195 | NaN | 0.58122 | NaN | NaN | 1977.721217 | NaN | 1.766118 | 472.768861 | NaN | NaN | NaN | 93.174777 | 48.313914 | 24.243317 | 1.794380 | 17.064428 | 1.744345 | NaN | NaN | NaN | 58.167923 | 6.104181 | 2007.769705 | NaN | NaN |
| std | 421.321334 | 42.746880 | NaN | 22.376841 | 4955.517327 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 1.436812 | 1.113740 | 30.390071 | 21.130467 | NaN | NaN | NaN | NaN | NaN | 177.625900 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 455.268042 | NaN | 176.753926 | 437.260486 | 442.898624 | NaN | NaN | NaN | NaN | 398.165820 | 420.610226 | 44.043251 | 485.566099 | 0.530648 | 0.252468 | 0.555190 | 0.503017 | 0.829788 | 0.208472 | NaN | 1.508895 | NaN | 0.64742 | NaN | NaN | 26.431175 | NaN | 0.775945 | 217.048611 | NaN | NaN | NaN | 127.744882 | 68.883364 | 67.227765 | 20.207842 | 56.609763 | 30.491646 | NaN | NaN | NaN | 630.806978 | 2.722432 | 1.301740 | NaN | NaN |
| min | 1461.000000 | 20.000000 | NaN | 21.000000 | 1470.000000 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 1.000000 | 1.000000 | 1879.000000 | 1950.000000 | NaN | NaN | NaN | NaN | NaN | 0.000000 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 0.000000 | NaN | 0.000000 | 0.000000 | 0.000000 | NaN | NaN | NaN | NaN | 407.000000 | 0.000000 | 0.000000 | 407.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | NaN | 3.000000 | NaN | 0.00000 | NaN | NaN | 1895.000000 | NaN | 0.000000 | 0.000000 | NaN | NaN | NaN | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | NaN | NaN | NaN | 0.000000 | 1.000000 | 2006.000000 | NaN | NaN |
| 25% | 1825.500000 | 20.000000 | NaN | 58.000000 | 7391.000000 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 5.000000 | 5.000000 | 1953.000000 | 1963.000000 | NaN | NaN | NaN | NaN | NaN | 0.000000 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 0.000000 | NaN | 0.000000 | 219.250000 | 784.000000 | NaN | NaN | NaN | NaN | 873.500000 | 0.000000 | 0.000000 | 1117.500000 | 0.000000 | 0.000000 | 1.000000 | 0.000000 | 2.000000 | 1.000000 | NaN | 5.000000 | NaN | 0.00000 | NaN | NaN | 1959.000000 | NaN | 1.000000 | 318.000000 | NaN | NaN | NaN | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | NaN | NaN | NaN | 0.000000 | 4.000000 | 2007.000000 | NaN | NaN |
| 50% | 2190.000000 | 50.000000 | NaN | 67.000000 | 9399.000000 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 6.000000 | 5.000000 | 1973.000000 | 1992.000000 | NaN | NaN | NaN | NaN | NaN | 0.000000 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 350.500000 | NaN | 0.000000 | 460.000000 | 988.000000 | NaN | NaN | NaN | NaN | 1079.000000 | 0.000000 | 0.000000 | 1432.000000 | 0.000000 | 0.000000 | 2.000000 | 0.000000 | 3.000000 | 1.000000 | NaN | 6.000000 | NaN | 0.00000 | NaN | NaN | 1979.000000 | NaN | 2.000000 | 480.000000 | NaN | NaN | NaN | 0.000000 | 28.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | NaN | NaN | NaN | 0.000000 | 6.000000 | 2008.000000 | NaN | NaN |
| 75% | 2554.500000 | 70.000000 | NaN | 80.000000 | 11517.500000 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 7.000000 | 6.000000 | 2001.000000 | 2004.000000 | NaN | NaN | NaN | NaN | NaN | 164.000000 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 753.500000 | NaN | 0.000000 | 797.750000 | 1305.000000 | NaN | NaN | NaN | NaN | 1382.500000 | 676.000000 | 0.000000 | 1721.000000 | 1.000000 | 0.000000 | 2.000000 | 1.000000 | 3.000000 | 1.000000 | NaN | 7.000000 | NaN | 1.00000 | NaN | NaN | 2002.000000 | NaN | 2.000000 | 576.000000 | NaN | NaN | NaN | 168.000000 | 72.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | NaN | NaN | NaN | 0.000000 | 8.000000 | 2009.000000 | NaN | NaN |
| max | 2919.000000 | 190.000000 | NaN | 200.000000 | 56600.000000 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 10.000000 | 9.000000 | 2010.000000 | 2010.000000 | NaN | NaN | NaN | NaN | NaN | 1290.000000 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 4010.000000 | NaN | 1526.000000 | 2140.000000 | 5095.000000 | NaN | NaN | NaN | NaN | 5095.000000 | 1862.000000 | 1064.000000 | 5095.000000 | 3.000000 | 2.000000 | 4.000000 | 2.000000 | 6.000000 | 2.000000 | NaN | 15.000000 | NaN | 4.00000 | NaN | NaN | 2207.000000 | NaN | 5.000000 | 1488.000000 | NaN | NaN | NaN | 1424.000000 | 742.000000 | 1012.000000 | 360.000000 | 576.000000 | 800.000000 | NaN | NaN | NaN | 17000.000000 | 12.000000 | 2010.000000 | NaN | NaN |
## Get columns data types of train data.
data.dtypes
Id int64 MSSubClass int64 MSZoning object LotFrontage float64 LotArea int64 Street object Alley object LotShape object LandContour object Utilities object LotConfig object LandSlope object Neighborhood object Condition1 object Condition2 object BldgType object HouseStyle object OverallQual int64 OverallCond int64 YearBuilt int64 YearRemodAdd int64 RoofStyle object RoofMatl object Exterior1st object Exterior2nd object MasVnrType object MasVnrArea float64 ExterQual object ExterCond object Foundation object BsmtQual object BsmtCond object BsmtExposure object BsmtFinType1 object BsmtFinSF1 int64 BsmtFinType2 object BsmtFinSF2 int64 BsmtUnfSF int64 TotalBsmtSF int64 Heating object HeatingQC object CentralAir object Electrical object 1stFlrSF int64 2ndFlrSF int64 LowQualFinSF int64 GrLivArea int64 BsmtFullBath int64 BsmtHalfBath int64 FullBath int64 HalfBath int64 BedroomAbvGr int64 KitchenAbvGr int64 KitchenQual object TotRmsAbvGrd int64 Functional object Fireplaces int64 FireplaceQu object GarageType object GarageYrBlt float64 GarageFinish object GarageCars int64 GarageArea int64 GarageQual object GarageCond object PavedDrive object WoodDeckSF int64 OpenPorchSF int64 EnclosedPorch int64 3SsnPorch int64 ScreenPorch int64 PoolArea int64 PoolQC object Fence object MiscFeature object MiscVal int64 MoSold int64 YrSold int64 SaleType object SaleCondition object SalePrice int64 dtype: object
## Get columns data types of test data.
test_data.dtypes
Id int64 MSSubClass int64 MSZoning object LotFrontage float64 LotArea int64 Street object Alley object LotShape object LandContour object Utilities object LotConfig object LandSlope object Neighborhood object Condition1 object Condition2 object BldgType object HouseStyle object OverallQual int64 OverallCond int64 YearBuilt int64 YearRemodAdd int64 RoofStyle object RoofMatl object Exterior1st object Exterior2nd object MasVnrType object MasVnrArea float64 ExterQual object ExterCond object Foundation object BsmtQual object BsmtCond object BsmtExposure object BsmtFinType1 object BsmtFinSF1 float64 BsmtFinType2 object BsmtFinSF2 float64 BsmtUnfSF float64 TotalBsmtSF float64 Heating object HeatingQC object CentralAir object Electrical object 1stFlrSF int64 2ndFlrSF int64 LowQualFinSF int64 GrLivArea int64 BsmtFullBath float64 BsmtHalfBath float64 FullBath int64 HalfBath int64 BedroomAbvGr int64 KitchenAbvGr int64 KitchenQual object TotRmsAbvGrd int64 Functional object Fireplaces int64 FireplaceQu object GarageType object GarageYrBlt float64 GarageFinish object GarageCars float64 GarageArea float64 GarageQual object GarageCond object PavedDrive object WoodDeckSF int64 OpenPorchSF int64 EnclosedPorch int64 3SsnPorch int64 ScreenPorch int64 PoolArea int64 PoolQC object Fence object MiscFeature object MiscVal int64 MoSold int64 YrSold int64 SaleType object SaleCondition object dtype: object
## EDA
## Plot scatter matrix for train data.
pd.plotting.scatter_matrix(data, figsize=(75, 75), diagonal='kde')
plt.show()
## Plot scatter matrix for test data.
pd.plotting.scatter_matrix(test_data, figsize=(75, 75), diagonal='kde')
plt.show()
## Plot correlation matrix for train data.
plt.figure(figsize=(75,75))
sns.heatmap(data.corr(),cmap='coolwarm',annot = True)
plt.show()
## Plot correlation matrix for test data.
plt.figure(figsize=(75,75))
sns.heatmap(test_data.corr(),cmap='coolwarm',annot = True)
plt.show()
## Plot Probability plot for target varible.
from scipy import stats
#Get also the QQ-plot
fig = plt.figure()
res = stats.probplot(data['SalePrice'], plot=plt)
plt.show()
## Display distribution plot for target,GarageCars columns.
plt.figure(figsize=(16,8))
sns.boxplot(x='GarageCars',y='SalePrice',data=data)
plt.show()
## Display scatter plot for GarageArea,target columns.
sns.lmplot(x='GarageArea',y='SalePrice',data=data)
C:\Users\nagar\Anaconda3\lib\site-packages\scipy\stats\stats.py:1713: FutureWarning: Using a non-tuple sequence for multidimensional indexing is deprecated; use `arr[tuple(seq)]` instead of `arr[seq]`. In the future this will be interpreted as an array index, `arr[np.array(seq)]`, which will result either in an error or a different result. return np.add.reduce(sorted[indexer] * weights, axis=axis) / sumval
<seaborn.axisgrid.FacetGrid at 0x209072cecf8>
## Saleprice correlation matrix
k = 10 ## Number of variables for heatmap
plt.figure(figsize=(16,8))
corrmat = data.corr()
## Picking the top 15 correlated features
cols = corrmat.nlargest(k, 'SalePrice')['SalePrice'].index
cm = np.corrcoef(data[cols].values.T)
sns.set(font_scale=1.25)
hm = sns.heatmap(cm, cbar=True, annot=True, square=True, fmt='.2f', annot_kws={'size': 10}, yticklabels=cols.values, xticklabels=cols.values)
plt.show()
## Plot Histogram for target column.
sns.distplot(data['SalePrice']);
C:\Users\nagar\Anaconda3\lib\site-packages\scipy\stats\stats.py:1713: FutureWarning: Using a non-tuple sequence for multidimensional indexing is deprecated; use `arr[tuple(seq)]` instead of `arr[seq]`. In the future this will be interpreted as an array index, `arr[np.array(seq)]`, which will result either in an error or a different result. return np.add.reduce(sorted[indexer] * weights, axis=axis) / sumval C:\Users\nagar\Anaconda3\lib\site-packages\seaborn\distributions.py:218: MatplotlibDeprecationWarning: The 'normed' kwarg was deprecated in Matplotlib 2.1 and will be removed in 3.1. Use 'density' instead. color=hist_color, **hist_kws)
## Check skewness and kurtosis for target varible.
print("Skewness: %f" % data['SalePrice'].skew())
print("Kurtosis: %f" % data['SalePrice'].kurt())
Skewness: 1.882876 Kurtosis: 6.536282
## Scatter plot for some columns of train data.
sns.set()
cols = ['SalePrice', 'OverallQual', 'GrLivArea', 'GarageCars', 'TotalBsmtSF', 'YearBuilt']
sns.pairplot(data[cols], size = 2.5)
plt.show();
## Histogram and normal probability plot.
from scipy.stats import norm
sns.distplot(data['SalePrice'], fit=norm);
fig = plt.figure()
res = stats.probplot(data['SalePrice'], plot=plt)
C:\Users\nagar\Anaconda3\lib\site-packages\scipy\stats\stats.py:1713: FutureWarning: Using a non-tuple sequence for multidimensional indexing is deprecated; use `arr[tuple(seq)]` instead of `arr[seq]`. In the future this will be interpreted as an array index, `arr[np.array(seq)]`, which will result either in an error or a different result. return np.add.reduce(sorted[indexer] * weights, axis=axis) / sumval C:\Users\nagar\Anaconda3\lib\site-packages\seaborn\distributions.py:218: MatplotlibDeprecationWarning: The 'normed' kwarg was deprecated in Matplotlib 2.1 and will be removed in 3.1. Use 'density' instead. color=hist_color, **hist_kws)
## Display correlation plot in desecnding order.
corr = data.select_dtypes(include=['int64','float64']).corr()
plt.figure(figsize=(16,6))
corr['SalePrice'].sort_values(ascending=False)[1:].plot(kind='bar')
plt.tight_layout()
## Visualize missing data.
missing_value = data.isnull().sum().sort_values(ascending=False) / len(data) * 100
missing_value = missing_value[missing_value != 0]
missing_value = pd.DataFrame({'Missing value' :missing_value,'Type':missing_value.index.map(lambda x:data[x].dtype)})
missing_value.plot(kind='bar',figsize=(16,4))
plt.show()
## Display distribution plots.
quantitative = data.select_dtypes('int64')
f = pd.melt(data, value_vars=quantitative)
g = sns.FacetGrid(f, col="variable", col_wrap=2, sharex=False, sharey=False)
g = g.map(sns.distplot, "value")
C:\Users\nagar\Anaconda3\lib\site-packages\scipy\stats\stats.py:1713: FutureWarning: Using a non-tuple sequence for multidimensional indexing is deprecated; use `arr[tuple(seq)]` instead of `arr[seq]`. In the future this will be interpreted as an array index, `arr[np.array(seq)]`, which will result either in an error or a different result. return np.add.reduce(sorted[indexer] * weights, axis=axis) / sumval C:\Users\nagar\Anaconda3\lib\site-packages\seaborn\distributions.py:218: MatplotlibDeprecationWarning: The 'normed' kwarg was deprecated in Matplotlib 2.1 and will be removed in 3.1. Use 'density' instead. color=hist_color, **hist_kws) C:\Users\nagar\Anaconda3\lib\site-packages\seaborn\distributions.py:218: MatplotlibDeprecationWarning: The 'normed' kwarg was deprecated in Matplotlib 2.1 and will be removed in 3.1. Use 'density' instead. color=hist_color, **hist_kws) C:\Users\nagar\Anaconda3\lib\site-packages\seaborn\distributions.py:218: MatplotlibDeprecationWarning: The 'normed' kwarg was deprecated in Matplotlib 2.1 and will be removed in 3.1. Use 'density' instead. color=hist_color, **hist_kws) C:\Users\nagar\Anaconda3\lib\site-packages\seaborn\distributions.py:218: MatplotlibDeprecationWarning: The 'normed' kwarg was deprecated in Matplotlib 2.1 and will be removed in 3.1. Use 'density' instead. color=hist_color, **hist_kws) C:\Users\nagar\Anaconda3\lib\site-packages\seaborn\distributions.py:218: MatplotlibDeprecationWarning: The 'normed' kwarg was deprecated in Matplotlib 2.1 and will be removed in 3.1. Use 'density' instead. color=hist_color, **hist_kws) C:\Users\nagar\Anaconda3\lib\site-packages\seaborn\distributions.py:218: MatplotlibDeprecationWarning: The 'normed' kwarg was deprecated in Matplotlib 2.1 and will be removed in 3.1. Use 'density' instead. color=hist_color, **hist_kws) C:\Users\nagar\Anaconda3\lib\site-packages\seaborn\distributions.py:218: MatplotlibDeprecationWarning: The 'normed' kwarg was deprecated in Matplotlib 2.1 and will be removed in 3.1. Use 'density' instead. color=hist_color, **hist_kws) C:\Users\nagar\Anaconda3\lib\site-packages\seaborn\distributions.py:218: MatplotlibDeprecationWarning: The 'normed' kwarg was deprecated in Matplotlib 2.1 and will be removed in 3.1. Use 'density' instead. color=hist_color, **hist_kws) C:\Users\nagar\Anaconda3\lib\site-packages\seaborn\distributions.py:218: MatplotlibDeprecationWarning: The 'normed' kwarg was deprecated in Matplotlib 2.1 and will be removed in 3.1. Use 'density' instead. color=hist_color, **hist_kws) C:\Users\nagar\Anaconda3\lib\site-packages\seaborn\distributions.py:218: MatplotlibDeprecationWarning: The 'normed' kwarg was deprecated in Matplotlib 2.1 and will be removed in 3.1. Use 'density' instead. color=hist_color, **hist_kws) C:\Users\nagar\Anaconda3\lib\site-packages\seaborn\distributions.py:218: MatplotlibDeprecationWarning: The 'normed' kwarg was deprecated in Matplotlib 2.1 and will be removed in 3.1. Use 'density' instead. color=hist_color, **hist_kws) C:\Users\nagar\Anaconda3\lib\site-packages\seaborn\distributions.py:218: MatplotlibDeprecationWarning: The 'normed' kwarg was deprecated in Matplotlib 2.1 and will be removed in 3.1. Use 'density' instead. color=hist_color, **hist_kws) C:\Users\nagar\Anaconda3\lib\site-packages\seaborn\distributions.py:218: MatplotlibDeprecationWarning: The 'normed' kwarg was deprecated in Matplotlib 2.1 and will be removed in 3.1. Use 'density' instead. color=hist_color, **hist_kws) C:\Users\nagar\Anaconda3\lib\site-packages\seaborn\distributions.py:218: MatplotlibDeprecationWarning: The 'normed' kwarg was deprecated in Matplotlib 2.1 and will be removed in 3.1. Use 'density' instead. color=hist_color, **hist_kws) C:\Users\nagar\Anaconda3\lib\site-packages\seaborn\distributions.py:218: MatplotlibDeprecationWarning: The 'normed' kwarg was deprecated in Matplotlib 2.1 and will be removed in 3.1. Use 'density' instead. color=hist_color, **hist_kws) C:\Users\nagar\Anaconda3\lib\site-packages\seaborn\distributions.py:218: MatplotlibDeprecationWarning: The 'normed' kwarg was deprecated in Matplotlib 2.1 and will be removed in 3.1. Use 'density' instead. color=hist_color, **hist_kws) C:\Users\nagar\Anaconda3\lib\site-packages\seaborn\distributions.py:218: MatplotlibDeprecationWarning: The 'normed' kwarg was deprecated in Matplotlib 2.1 and will be removed in 3.1. Use 'density' instead. color=hist_color, **hist_kws) C:\Users\nagar\Anaconda3\lib\site-packages\seaborn\distributions.py:218: MatplotlibDeprecationWarning: The 'normed' kwarg was deprecated in Matplotlib 2.1 and will be removed in 3.1. Use 'density' instead. color=hist_color, **hist_kws) C:\Users\nagar\Anaconda3\lib\site-packages\seaborn\distributions.py:218: MatplotlibDeprecationWarning: The 'normed' kwarg was deprecated in Matplotlib 2.1 and will be removed in 3.1. Use 'density' instead. color=hist_color, **hist_kws) C:\Users\nagar\Anaconda3\lib\site-packages\seaborn\distributions.py:218: MatplotlibDeprecationWarning: The 'normed' kwarg was deprecated in Matplotlib 2.1 and will be removed in 3.1. Use 'density' instead. color=hist_color, **hist_kws) C:\Users\nagar\Anaconda3\lib\site-packages\seaborn\distributions.py:218: MatplotlibDeprecationWarning: The 'normed' kwarg was deprecated in Matplotlib 2.1 and will be removed in 3.1. Use 'density' instead. color=hist_color, **hist_kws) C:\Users\nagar\Anaconda3\lib\site-packages\seaborn\distributions.py:218: MatplotlibDeprecationWarning: The 'normed' kwarg was deprecated in Matplotlib 2.1 and will be removed in 3.1. Use 'density' instead. color=hist_color, **hist_kws) C:\Users\nagar\Anaconda3\lib\site-packages\seaborn\distributions.py:218: MatplotlibDeprecationWarning: The 'normed' kwarg was deprecated in Matplotlib 2.1 and will be removed in 3.1. Use 'density' instead. color=hist_color, **hist_kws) C:\Users\nagar\Anaconda3\lib\site-packages\seaborn\distributions.py:218: MatplotlibDeprecationWarning: The 'normed' kwarg was deprecated in Matplotlib 2.1 and will be removed in 3.1. Use 'density' instead. color=hist_color, **hist_kws) C:\Users\nagar\Anaconda3\lib\site-packages\seaborn\distributions.py:218: MatplotlibDeprecationWarning: The 'normed' kwarg was deprecated in Matplotlib 2.1 and will be removed in 3.1. Use 'density' instead. color=hist_color, **hist_kws) C:\Users\nagar\Anaconda3\lib\site-packages\seaborn\distributions.py:218: MatplotlibDeprecationWarning: The 'normed' kwarg was deprecated in Matplotlib 2.1 and will be removed in 3.1. Use 'density' instead. color=hist_color, **hist_kws) C:\Users\nagar\Anaconda3\lib\site-packages\seaborn\distributions.py:218: MatplotlibDeprecationWarning: The 'normed' kwarg was deprecated in Matplotlib 2.1 and will be removed in 3.1. Use 'density' instead. color=hist_color, **hist_kws) C:\Users\nagar\Anaconda3\lib\site-packages\seaborn\distributions.py:218: MatplotlibDeprecationWarning: The 'normed' kwarg was deprecated in Matplotlib 2.1 and will be removed in 3.1. Use 'density' instead. color=hist_color, **hist_kws) C:\Users\nagar\Anaconda3\lib\site-packages\seaborn\distributions.py:218: MatplotlibDeprecationWarning: The 'normed' kwarg was deprecated in Matplotlib 2.1 and will be removed in 3.1. Use 'density' instead. color=hist_color, **hist_kws) C:\Users\nagar\Anaconda3\lib\site-packages\seaborn\distributions.py:218: MatplotlibDeprecationWarning: The 'normed' kwarg was deprecated in Matplotlib 2.1 and will be removed in 3.1. Use 'density' instead. color=hist_color, **hist_kws) C:\Users\nagar\Anaconda3\lib\site-packages\seaborn\distributions.py:218: MatplotlibDeprecationWarning: The 'normed' kwarg was deprecated in Matplotlib 2.1 and will be removed in 3.1. Use 'density' instead. color=hist_color, **hist_kws) C:\Users\nagar\Anaconda3\lib\site-packages\seaborn\distributions.py:218: MatplotlibDeprecationWarning: The 'normed' kwarg was deprecated in Matplotlib 2.1 and will be removed in 3.1. Use 'density' instead. color=hist_color, **hist_kws) C:\Users\nagar\Anaconda3\lib\site-packages\seaborn\distributions.py:218: MatplotlibDeprecationWarning: The 'normed' kwarg was deprecated in Matplotlib 2.1 and will be removed in 3.1. Use 'density' instead. color=hist_color, **hist_kws) C:\Users\nagar\Anaconda3\lib\site-packages\seaborn\distributions.py:218: MatplotlibDeprecationWarning: The 'normed' kwarg was deprecated in Matplotlib 2.1 and will be removed in 3.1. Use 'density' instead. color=hist_color, **hist_kws) C:\Users\nagar\Anaconda3\lib\site-packages\seaborn\distributions.py:218: MatplotlibDeprecationWarning: The 'normed' kwarg was deprecated in Matplotlib 2.1 and will be removed in 3.1. Use 'density' instead. color=hist_color, **hist_kws)
## Display distribution plots.
qualitative = data.select_dtypes(['object','category'])
temp = data
for c in qualitative:
temp[c] = temp[c].astype('category')
if temp[c].isnull().any():
temp[c] = temp[c].cat.add_categories(['MISSING'])
temp[c] = temp[c].fillna('MISSING')
def boxplot(x, y, **kwargs):
sns.boxplot(x=x, y=y)
x=plt.xticks(rotation=90)
f = pd.melt(temp, id_vars=['SalePrice'], value_vars=qualitative)
g = sns.FacetGrid(f, col="variable", col_wrap=2, sharex=False, sharey=False, size=5)
g = g.map(boxplot, "value", "SalePrice")
## Plot histograms for train data.
(data.select_dtypes(include = ['float64', 'int64'])).hist(figsize=(16, 20), bins=50, xlabelsize=8, ylabelsize=8);
## Pair Plot.
temp_1 =data.select_dtypes(include = ['float64', 'int64'])
for i in range(0, len(temp_1.columns), 5):
sns.pairplot(data=temp_1,
x_vars=temp_1.columns[i:i+5],
y_vars=['SalePrice'])
## Magic command.
%matplotlib inline
## Get missing values for train data.
data.isna().sum()
Id 0 MSSubClass 0 MSZoning 0 LotFrontage 259 LotArea 0 Street 0 Alley 1369 LotShape 0 LandContour 0 Utilities 0 LotConfig 0 LandSlope 0 Neighborhood 0 Condition1 0 Condition2 0 BldgType 0 HouseStyle 0 OverallQual 0 OverallCond 0 YearBuilt 0 YearRemodAdd 0 RoofStyle 0 RoofMatl 0 Exterior1st 0 Exterior2nd 0 MasVnrType 8 MasVnrArea 8 ExterQual 0 ExterCond 0 Foundation 0 BsmtQual 37 BsmtCond 37 BsmtExposure 38 BsmtFinType1 37 BsmtFinSF1 0 BsmtFinType2 38 BsmtFinSF2 0 BsmtUnfSF 0 TotalBsmtSF 0 Heating 0 HeatingQC 0 CentralAir 0 Electrical 1 1stFlrSF 0 2ndFlrSF 0 LowQualFinSF 0 GrLivArea 0 BsmtFullBath 0 BsmtHalfBath 0 FullBath 0 HalfBath 0 BedroomAbvGr 0 KitchenAbvGr 0 KitchenQual 0 TotRmsAbvGrd 0 Functional 0 Fireplaces 0 FireplaceQu 690 GarageType 81 GarageYrBlt 81 GarageFinish 81 GarageCars 0 GarageArea 0 GarageQual 81 GarageCond 81 PavedDrive 0 WoodDeckSF 0 OpenPorchSF 0 EnclosedPorch 0 3SsnPorch 0 ScreenPorch 0 PoolArea 0 PoolQC 1453 Fence 1179 MiscFeature 1406 MiscVal 0 MoSold 0 YrSold 0 SaleType 0 SaleCondition 0 SalePrice 0 dtype: int64
## Get missing values for test data.
test_data.isna().sum()
Id 0 MSSubClass 0 MSZoning 4 LotFrontage 227 LotArea 0 Street 0 Alley 1352 LotShape 0 LandContour 0 Utilities 2 LotConfig 0 LandSlope 0 Neighborhood 0 Condition1 0 Condition2 0 BldgType 0 HouseStyle 0 OverallQual 0 OverallCond 0 YearBuilt 0 YearRemodAdd 0 RoofStyle 0 RoofMatl 0 Exterior1st 1 Exterior2nd 1 MasVnrType 16 MasVnrArea 15 ExterQual 0 ExterCond 0 Foundation 0 BsmtQual 44 BsmtCond 45 BsmtExposure 44 BsmtFinType1 42 BsmtFinSF1 1 BsmtFinType2 42 BsmtFinSF2 1 BsmtUnfSF 1 TotalBsmtSF 1 Heating 0 HeatingQC 0 CentralAir 0 Electrical 0 1stFlrSF 0 2ndFlrSF 0 LowQualFinSF 0 GrLivArea 0 BsmtFullBath 2 BsmtHalfBath 2 FullBath 0 HalfBath 0 BedroomAbvGr 0 KitchenAbvGr 0 KitchenQual 1 TotRmsAbvGrd 0 Functional 2 Fireplaces 0 FireplaceQu 730 GarageType 76 GarageYrBlt 78 GarageFinish 78 GarageCars 1 GarageArea 1 GarageQual 78 GarageCond 78 PavedDrive 0 WoodDeckSF 0 OpenPorchSF 0 EnclosedPorch 0 3SsnPorch 0 ScreenPorch 0 PoolArea 0 PoolQC 1456 Fence 1169 MiscFeature 1408 MiscVal 0 MoSold 0 YrSold 0 SaleType 1 SaleCondition 0 dtype: int64
### Find missing values % for train data.
missing_value = (data.isna().sum()/len(data)).round(4)*100
missing_value.sort_values(ascending=False)
#missing_value.count
PoolQC 99.52 MiscFeature 96.30 Alley 93.77 Fence 80.75 FireplaceQu 47.26 LotFrontage 17.74 GarageCond 5.55 GarageType 5.55 GarageYrBlt 5.55 GarageFinish 5.55 GarageQual 5.55 BsmtExposure 2.60 BsmtFinType2 2.60 BsmtFinType1 2.53 BsmtCond 2.53 BsmtQual 2.53 MasVnrArea 0.55 MasVnrType 0.55 Electrical 0.07 Utilities 0.00 YearRemodAdd 0.00 MSSubClass 0.00 Foundation 0.00 ExterCond 0.00 ExterQual 0.00 Exterior2nd 0.00 Exterior1st 0.00 RoofMatl 0.00 RoofStyle 0.00 YearBuilt 0.00 LotConfig 0.00 OverallCond 0.00 OverallQual 0.00 HouseStyle 0.00 BldgType 0.00 Condition2 0.00 BsmtFinSF1 0.00 MSZoning 0.00 LotArea 0.00 Street 0.00 Condition1 0.00 Neighborhood 0.00 LotShape 0.00 LandContour 0.00 LandSlope 0.00 SalePrice 0.00 HeatingQC 0.00 BsmtFinSF2 0.00 EnclosedPorch 0.00 Fireplaces 0.00 GarageCars 0.00 GarageArea 0.00 PavedDrive 0.00 WoodDeckSF 0.00 OpenPorchSF 0.00 3SsnPorch 0.00 BsmtUnfSF 0.00 ScreenPorch 0.00 PoolArea 0.00 MiscVal 0.00 MoSold 0.00 YrSold 0.00 SaleType 0.00 Functional 0.00 TotRmsAbvGrd 0.00 KitchenQual 0.00 KitchenAbvGr 0.00 BedroomAbvGr 0.00 HalfBath 0.00 FullBath 0.00 BsmtHalfBath 0.00 BsmtFullBath 0.00 GrLivArea 0.00 LowQualFinSF 0.00 2ndFlrSF 0.00 1stFlrSF 0.00 CentralAir 0.00 SaleCondition 0.00 Heating 0.00 TotalBsmtSF 0.00 Id 0.00 dtype: float64
### Find missing values % for test data.
missing_value_test = (test_data.isna().sum()/len(test_data)).round(4)*100
missing_value_test.sort_values(ascending=False)
PoolQC 99.79 MiscFeature 96.50 Alley 92.67 Fence 80.12 FireplaceQu 50.03 LotFrontage 15.56 GarageCond 5.35 GarageQual 5.35 GarageYrBlt 5.35 GarageFinish 5.35 GarageType 5.21 BsmtCond 3.08 BsmtQual 3.02 BsmtExposure 3.02 BsmtFinType1 2.88 BsmtFinType2 2.88 MasVnrType 1.10 MasVnrArea 1.03 MSZoning 0.27 BsmtHalfBath 0.14 Utilities 0.14 Functional 0.14 BsmtFullBath 0.14 BsmtFinSF2 0.07 BsmtFinSF1 0.07 Exterior2nd 0.07 BsmtUnfSF 0.07 TotalBsmtSF 0.07 SaleType 0.07 Exterior1st 0.07 KitchenQual 0.07 GarageArea 0.07 GarageCars 0.07 HouseStyle 0.00 LandSlope 0.00 MSSubClass 0.00 LotArea 0.00 Street 0.00 LotShape 0.00 LandContour 0.00 LotConfig 0.00 Neighborhood 0.00 BldgType 0.00 Condition1 0.00 Condition2 0.00 RoofMatl 0.00 RoofStyle 0.00 YearRemodAdd 0.00 YearBuilt 0.00 OverallCond 0.00 OverallQual 0.00 SaleCondition 0.00 Heating 0.00 ExterQual 0.00 TotRmsAbvGrd 0.00 YrSold 0.00 MoSold 0.00 MiscVal 0.00 PoolArea 0.00 ScreenPorch 0.00 3SsnPorch 0.00 EnclosedPorch 0.00 OpenPorchSF 0.00 WoodDeckSF 0.00 PavedDrive 0.00 Fireplaces 0.00 KitchenAbvGr 0.00 ExterCond 0.00 BedroomAbvGr 0.00 HalfBath 0.00 FullBath 0.00 GrLivArea 0.00 LowQualFinSF 0.00 2ndFlrSF 0.00 1stFlrSF 0.00 Electrical 0.00 CentralAir 0.00 HeatingQC 0.00 Foundation 0.00 Id 0.00 dtype: float64
## Method will return number of levels,null values,unique values,data types
def Observations(df):
return(pd.DataFrame({'dtypes' : df.dtypes,
'levels' : [df[x].unique() for x in df.columns],
'null_values' : df.isna().sum(),
'Unique Values': df.nunique()
}))
## Get columns data types,number of leveel,null values,unique value for each column of train data.
Observations(data)
| dtypes | levels | null_values | Unique Values | |
|---|---|---|---|---|
| Id | int64 | [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14... | 0 | 1460 |
| MSSubClass | int64 | [60, 20, 70, 50, 190, 45, 90, 120, 30, 85, 80,... | 0 | 15 |
| MSZoning | object | [RL, RM, C (all), FV, RH] | 0 | 5 |
| LotFrontage | float64 | [65.0, 80.0, 68.0, 60.0, 84.0, 85.0, 75.0, nan... | 259 | 110 |
| LotArea | int64 | [8450, 9600, 11250, 9550, 14260, 14115, 10084,... | 0 | 1073 |
| Street | object | [Pave, Grvl] | 0 | 2 |
| Alley | object | [nan, Grvl, Pave] | 1369 | 2 |
| LotShape | object | [Reg, IR1, IR2, IR3] | 0 | 4 |
| LandContour | object | [Lvl, Bnk, Low, HLS] | 0 | 4 |
| Utilities | object | [AllPub, NoSeWa] | 0 | 2 |
| LotConfig | object | [Inside, FR2, Corner, CulDSac, FR3] | 0 | 5 |
| LandSlope | object | [Gtl, Mod, Sev] | 0 | 3 |
| Neighborhood | object | [CollgCr, Veenker, Crawfor, NoRidge, Mitchel, ... | 0 | 25 |
| Condition1 | object | [Norm, Feedr, PosN, Artery, RRAe, RRNn, RRAn, ... | 0 | 9 |
| Condition2 | object | [Norm, Artery, RRNn, Feedr, PosN, PosA, RRAn, ... | 0 | 8 |
| BldgType | object | [1Fam, 2fmCon, Duplex, TwnhsE, Twnhs] | 0 | 5 |
| HouseStyle | object | [2Story, 1Story, 1.5Fin, 1.5Unf, SFoyer, SLvl,... | 0 | 8 |
| OverallQual | int64 | [7, 6, 8, 5, 9, 4, 10, 3, 1, 2] | 0 | 10 |
| OverallCond | int64 | [5, 8, 6, 7, 4, 2, 3, 9, 1] | 0 | 9 |
| YearBuilt | int64 | [2003, 1976, 2001, 1915, 2000, 1993, 2004, 197... | 0 | 112 |
| YearRemodAdd | int64 | [2003, 1976, 2002, 1970, 2000, 1995, 2005, 197... | 0 | 61 |
| RoofStyle | object | [Gable, Hip, Gambrel, Mansard, Flat, Shed] | 0 | 6 |
| RoofMatl | object | [CompShg, WdShngl, Metal, WdShake, Membran, Ta... | 0 | 8 |
| Exterior1st | object | [VinylSd, MetalSd, Wd Sdng, HdBoard, BrkFace, ... | 0 | 15 |
| Exterior2nd | object | [VinylSd, MetalSd, Wd Shng, HdBoard, Plywood, ... | 0 | 16 |
| MasVnrType | object | [BrkFace, None, Stone, BrkCmn, nan] | 8 | 4 |
| MasVnrArea | float64 | [196.0, 0.0, 162.0, 350.0, 186.0, 240.0, 286.0... | 8 | 327 |
| ExterQual | object | [Gd, TA, Ex, Fa] | 0 | 4 |
| ExterCond | object | [TA, Gd, Fa, Po, Ex] | 0 | 5 |
| Foundation | object | [PConc, CBlock, BrkTil, Wood, Slab, Stone] | 0 | 6 |
| BsmtQual | object | [Gd, TA, Ex, nan, Fa] | 37 | 4 |
| BsmtCond | object | [TA, Gd, nan, Fa, Po] | 37 | 4 |
| BsmtExposure | object | [No, Gd, Mn, Av, nan] | 38 | 4 |
| BsmtFinType1 | object | [GLQ, ALQ, Unf, Rec, BLQ, nan, LwQ] | 37 | 6 |
| BsmtFinSF1 | int64 | [706, 978, 486, 216, 655, 732, 1369, 859, 0, 8... | 0 | 637 |
| BsmtFinType2 | object | [Unf, BLQ, nan, ALQ, Rec, LwQ, GLQ] | 38 | 6 |
| BsmtFinSF2 | int64 | [0, 32, 668, 486, 93, 491, 506, 712, 362, 41, ... | 0 | 144 |
| BsmtUnfSF | int64 | [150, 284, 434, 540, 490, 64, 317, 216, 952, 1... | 0 | 780 |
| TotalBsmtSF | int64 | [856, 1262, 920, 756, 1145, 796, 1686, 1107, 9... | 0 | 721 |
| Heating | object | [GasA, GasW, Grav, Wall, OthW, Floor] | 0 | 6 |
| HeatingQC | object | [Ex, Gd, TA, Fa, Po] | 0 | 5 |
| CentralAir | object | [Y, N] | 0 | 2 |
| Electrical | object | [SBrkr, FuseF, FuseA, FuseP, Mix, nan] | 1 | 5 |
| 1stFlrSF | int64 | [856, 1262, 920, 961, 1145, 796, 1694, 1107, 1... | 0 | 753 |
| 2ndFlrSF | int64 | [854, 0, 866, 756, 1053, 566, 983, 752, 1142, ... | 0 | 417 |
| LowQualFinSF | int64 | [0, 360, 513, 234, 528, 572, 144, 392, 371, 39... | 0 | 24 |
| GrLivArea | int64 | [1710, 1262, 1786, 1717, 2198, 1362, 1694, 209... | 0 | 861 |
| BsmtFullBath | int64 | [1, 0, 2, 3] | 0 | 4 |
| BsmtHalfBath | int64 | [0, 1, 2] | 0 | 3 |
| FullBath | int64 | [2, 1, 3, 0] | 0 | 4 |
| HalfBath | int64 | [1, 0, 2] | 0 | 3 |
| BedroomAbvGr | int64 | [3, 4, 1, 2, 0, 5, 6, 8] | 0 | 8 |
| KitchenAbvGr | int64 | [1, 2, 3, 0] | 0 | 4 |
| KitchenQual | object | [Gd, TA, Ex, Fa] | 0 | 4 |
| TotRmsAbvGrd | int64 | [8, 6, 7, 9, 5, 11, 4, 10, 12, 3, 2, 14] | 0 | 12 |
| Functional | object | [Typ, Min1, Maj1, Min2, Mod, Maj2, Sev] | 0 | 7 |
| Fireplaces | int64 | [0, 1, 2, 3] | 0 | 4 |
| FireplaceQu | object | [nan, TA, Gd, Fa, Ex, Po] | 690 | 5 |
| GarageType | object | [Attchd, Detchd, BuiltIn, CarPort, nan, Basmen... | 81 | 6 |
| GarageYrBlt | float64 | [2003.0, 1976.0, 2001.0, 1998.0, 2000.0, 1993.... | 81 | 97 |
| GarageFinish | object | [RFn, Unf, Fin, nan] | 81 | 3 |
| GarageCars | int64 | [2, 3, 1, 0, 4] | 0 | 5 |
| GarageArea | int64 | [548, 460, 608, 642, 836, 480, 636, 484, 468, ... | 0 | 441 |
| GarageQual | object | [TA, Fa, Gd, nan, Ex, Po] | 81 | 5 |
| GarageCond | object | [TA, Fa, nan, Gd, Po, Ex] | 81 | 5 |
| PavedDrive | object | [Y, N, P] | 0 | 3 |
| WoodDeckSF | int64 | [0, 298, 192, 40, 255, 235, 90, 147, 140, 160,... | 0 | 274 |
| OpenPorchSF | int64 | [61, 0, 42, 35, 84, 30, 57, 204, 4, 21, 33, 21... | 0 | 202 |
| EnclosedPorch | int64 | [0, 272, 228, 205, 176, 87, 172, 102, 37, 144,... | 0 | 120 |
| 3SsnPorch | int64 | [0, 320, 407, 130, 180, 168, 140, 508, 238, 24... | 0 | 20 |
| ScreenPorch | int64 | [0, 176, 198, 291, 252, 99, 184, 168, 130, 142... | 0 | 76 |
| PoolArea | int64 | [0, 512, 648, 576, 555, 480, 519, 738] | 0 | 8 |
| PoolQC | object | [nan, Ex, Fa, Gd] | 1453 | 3 |
| Fence | object | [nan, MnPrv, GdWo, GdPrv, MnWw] | 1179 | 4 |
| MiscFeature | object | [nan, Shed, Gar2, Othr, TenC] | 1406 | 4 |
| MiscVal | int64 | [0, 700, 350, 500, 400, 480, 450, 15500, 1200,... | 0 | 21 |
| MoSold | int64 | [2, 5, 9, 12, 10, 8, 11, 4, 1, 7, 3, 6] | 0 | 12 |
| YrSold | int64 | [2008, 2007, 2006, 2009, 2010] | 0 | 5 |
| SaleType | object | [WD, New, COD, ConLD, ConLI, CWD, ConLw, Con, ... | 0 | 9 |
| SaleCondition | object | [Normal, Abnorml, Partial, AdjLand, Alloca, Fa... | 0 | 6 |
| SalePrice | int64 | [208500, 181500, 223500, 140000, 250000, 14300... | 0 | 663 |
## Get columns data types,number of leveel,null values,unique value for each column of test data.
Observations(test_data)
| dtypes | levels | null_values | Unique Values | |
|---|---|---|---|---|
| Id | int64 | [1461, 1462, 1463, 1464, 1465, 1466, 1467, 146... | 0 | 1459 |
| MSSubClass | int64 | [20, 60, 120, 160, 80, 30, 50, 90, 85, 190, 45... | 0 | 16 |
| MSZoning | object | [RH, RL, RM, FV, C (all), nan] | 4 | 5 |
| LotFrontage | float64 | [80.0, 81.0, 74.0, 78.0, 43.0, 75.0, nan, 63.0... | 227 | 115 |
| LotArea | int64 | [11622, 14267, 13830, 9978, 5005, 10000, 7980,... | 0 | 1106 |
| Street | object | [Pave, Grvl] | 0 | 2 |
| Alley | object | [nan, Pave, Grvl] | 1352 | 2 |
| LotShape | object | [Reg, IR1, IR2, IR3] | 0 | 4 |
| LandContour | object | [Lvl, HLS, Bnk, Low] | 0 | 4 |
| Utilities | object | [AllPub, nan] | 2 | 1 |
| LotConfig | object | [Inside, Corner, FR2, CulDSac, FR3] | 0 | 5 |
| LandSlope | object | [Gtl, Mod, Sev] | 0 | 3 |
| Neighborhood | object | [NAmes, Gilbert, StoneBr, BrDale, NPkVill, Nri... | 0 | 25 |
| Condition1 | object | [Feedr, Norm, PosN, RRNe, Artery, RRNn, PosA, ... | 0 | 9 |
| Condition2 | object | [Norm, Feedr, PosA, PosN, Artery] | 0 | 5 |
| BldgType | object | [1Fam, TwnhsE, Twnhs, Duplex, 2fmCon] | 0 | 5 |
| HouseStyle | object | [1Story, 2Story, SLvl, 1.5Fin, SFoyer, 2.5Unf,... | 0 | 7 |
| OverallQual | int64 | [5, 6, 8, 7, 4, 9, 2, 3, 10, 1] | 0 | 10 |
| OverallCond | int64 | [6, 5, 7, 8, 2, 9, 3, 4, 1] | 0 | 9 |
| YearBuilt | int64 | [1961, 1958, 1997, 1998, 1992, 1993, 1990, 197... | 0 | 106 |
| YearRemodAdd | int64 | [1961, 1958, 1998, 1992, 1994, 2007, 1990, 197... | 0 | 61 |
| RoofStyle | object | [Gable, Hip, Gambrel, Flat, Mansard, Shed] | 0 | 6 |
| RoofMatl | object | [CompShg, Tar&Grv, WdShake, WdShngl] | 0 | 4 |
| Exterior1st | object | [VinylSd, Wd Sdng, HdBoard, Plywood, MetalSd, ... | 1 | 13 |
| Exterior2nd | object | [VinylSd, Wd Sdng, HdBoard, Plywood, MetalSd, ... | 1 | 15 |
| MasVnrType | object | [None, BrkFace, Stone, BrkCmn, nan] | 16 | 4 |
| MasVnrArea | float64 | [0.0, 108.0, 20.0, 504.0, 492.0, 162.0, 256.0,... | 15 | 303 |
| ExterQual | object | [TA, Gd, Ex, Fa] | 0 | 4 |
| ExterCond | object | [TA, Gd, Fa, Po, Ex] | 0 | 5 |
| Foundation | object | [CBlock, PConc, BrkTil, Stone, Slab, Wood] | 0 | 6 |
| BsmtQual | object | [TA, Gd, Ex, Fa, nan] | 44 | 4 |
| BsmtCond | object | [TA, Po, Fa, Gd, nan] | 45 | 4 |
| BsmtExposure | object | [No, Gd, Mn, Av, nan] | 44 | 4 |
| BsmtFinType1 | object | [Rec, ALQ, GLQ, Unf, BLQ, LwQ, nan] | 42 | 6 |
| BsmtFinSF1 | float64 | [468.0, 923.0, 791.0, 602.0, 263.0, 0.0, 935.0... | 1 | 669 |
| BsmtFinType2 | object | [LwQ, Unf, Rec, BLQ, GLQ, ALQ, nan] | 42 | 6 |
| BsmtFinSF2 | float64 | [144.0, 0.0, 78.0, 859.0, 981.0, 42.0, 46.0, 1... | 1 | 161 |
| BsmtUnfSF | float64 | [270.0, 406.0, 137.0, 324.0, 1017.0, 763.0, 23... | 1 | 793 |
| TotalBsmtSF | float64 | [882.0, 1329.0, 928.0, 926.0, 1280.0, 763.0, 1... | 1 | 736 |
| Heating | object | [GasA, GasW, Grav, Wall] | 0 | 4 |
| HeatingQC | object | [TA, Gd, Ex, Fa, Po] | 0 | 5 |
| CentralAir | object | [Y, N] | 0 | 2 |
| Electrical | object | [SBrkr, FuseA, FuseF, FuseP] | 0 | 4 |
| 1stFlrSF | int64 | [896, 1329, 928, 926, 1280, 763, 1187, 789, 13... | 0 | 789 |
| 2ndFlrSF | int64 | [0, 701, 678, 892, 676, 504, 567, 601, 707, 56... | 0 | 407 |
| LowQualFinSF | int64 | [0, 362, 1064, 431, 436, 259, 312, 108, 697, 5... | 0 | 15 |
| GrLivArea | int64 | [896, 1329, 1629, 1604, 1280, 1655, 1187, 1465... | 0 | 879 |
| BsmtFullBath | float64 | [0.0, 1.0, 2.0, 3.0, nan] | 2 | 4 |
| BsmtHalfBath | float64 | [0.0, 1.0, nan, 2.0] | 2 | 3 |
| FullBath | int64 | [1, 2, 3, 4, 0] | 0 | 5 |
| HalfBath | int64 | [0, 1, 2] | 0 | 3 |
| BedroomAbvGr | int64 | [2, 3, 4, 1, 6, 5, 0] | 0 | 7 |
| KitchenAbvGr | int64 | [1, 2, 0] | 0 | 3 |
| KitchenQual | object | [TA, Gd, Ex, Fa, nan] | 1 | 4 |
| TotRmsAbvGrd | int64 | [5, 6, 7, 4, 10, 8, 9, 3, 12, 11, 13, 15] | 0 | 12 |
| Functional | object | [Typ, Min2, Min1, Mod, Maj1, Sev, Maj2, nan] | 2 | 7 |
| Fireplaces | int64 | [0, 1, 2, 3, 4] | 0 | 5 |
| FireplaceQu | object | [nan, TA, Gd, Po, Fa, Ex] | 730 | 5 |
| GarageType | object | [Attchd, Detchd, BuiltIn, nan, Basment, 2Types... | 76 | 6 |
| GarageYrBlt | float64 | [1961.0, 1958.0, 1997.0, 1998.0, 1992.0, 1993.... | 78 | 97 |
| GarageFinish | object | [Unf, Fin, RFn, nan] | 78 | 3 |
| GarageCars | float64 | [1.0, 2.0, 3.0, 0.0, 4.0, 5.0, nan] | 1 | 6 |
| GarageArea | float64 | [730.0, 312.0, 482.0, 470.0, 506.0, 440.0, 420... | 1 | 459 |
| GarageQual | object | [TA, nan, Fa, Gd, Po] | 78 | 4 |
| GarageCond | object | [TA, nan, Fa, Gd, Po, Ex] | 78 | 5 |
| PavedDrive | object | [Y, N, P] | 0 | 3 |
| WoodDeckSF | int64 | [140, 393, 212, 360, 0, 157, 483, 192, 240, 20... | 0 | 263 |
| OpenPorchSF | int64 | [0, 36, 34, 82, 84, 21, 75, 68, 30, 133, 35, 7... | 0 | 203 |
| EnclosedPorch | int64 | [0, 80, 186, 120, 150, 205, 113, 135, 126, 334... | 0 | 131 |
| 3SsnPorch | int64 | [0, 224, 255, 225, 360, 150, 153, 174, 120, 21... | 0 | 13 |
| ScreenPorch | int64 | [120, 0, 144, 256, 216, 204, 160, 240, 148, 16... | 0 | 75 |
| PoolArea | int64 | [0, 144, 368, 444, 228, 561, 800] | 0 | 7 |
| PoolQC | object | [nan, Ex, Gd] | 1456 | 2 |
| Fence | object | [MnPrv, nan, GdPrv, GdWo, MnWw] | 1169 | 4 |
| MiscFeature | object | [nan, Gar2, Shed, Othr] | 1408 | 3 |
| MiscVal | int64 | [0, 12500, 500, 1500, 300, 450, 80, 600, 490, ... | 0 | 26 |
| MoSold | int64 | [6, 3, 1, 4, 5, 2, 7, 10, 8, 11, 9, 12] | 0 | 12 |
| YrSold | int64 | [2010, 2009, 2008, 2007, 2006] | 0 | 5 |
| SaleType | object | [WD, COD, New, ConLD, Oth, Con, ConLw, ConLI, ... | 1 | 9 |
| SaleCondition | object | [Normal, Partial, Abnorml, Family, Alloca, Adj... | 0 | 6 |
## Check dimensionns of train data.
data.shape
(1460, 81)
## Check dimesnions of test data.
test_data.shape
(1459, 80)
### change nan calumn names to other
#data.Alley.replace(to_replace=dict(nan='NAC'), inplace=True)
#data.Alley.replace(['NaN'], ['NAC'], inplace=True)
#data.Alley[data.Alley == 'nan'] = 'NAC'
#data.Alley.replace(to_replace ="nan", value ="NAC", inplace=True)
## I am replacing NA to NAA for Alley column of train data,bcz NA is having different levels meaning.
data.Alley.fillna('NAA',inplace=True)
## I am replacing NA to NAA for Alley column of test data,bcz NA is having different levels meaning.
test_data.Alley.fillna('NAA',inplace=True)
#data.Alley.replace(to_replace = np.nan, value ='NAC', inplace=True)
## Check unique values for Alley column of train data.
data.Alley.unique()
array(['NAA', 'Grvl', 'Pave'], dtype=object)
## Check unique values for Alley column of test data.
test_data.Alley.unique()
array(['NAA', 'Pave', 'Grvl'], dtype=object)
## Check first 5 records of train data.
data.head()
| Id | MSSubClass | MSZoning | LotFrontage | LotArea | Street | Alley | LotShape | LandContour | Utilities | LotConfig | LandSlope | Neighborhood | Condition1 | Condition2 | BldgType | HouseStyle | OverallQual | OverallCond | YearBuilt | YearRemodAdd | RoofStyle | RoofMatl | Exterior1st | Exterior2nd | MasVnrType | MasVnrArea | ExterQual | ExterCond | Foundation | BsmtQual | BsmtCond | BsmtExposure | BsmtFinType1 | BsmtFinSF1 | BsmtFinType2 | BsmtFinSF2 | BsmtUnfSF | TotalBsmtSF | Heating | HeatingQC | CentralAir | Electrical | 1stFlrSF | 2ndFlrSF | LowQualFinSF | GrLivArea | BsmtFullBath | BsmtHalfBath | FullBath | HalfBath | BedroomAbvGr | KitchenAbvGr | KitchenQual | TotRmsAbvGrd | Functional | Fireplaces | FireplaceQu | GarageType | GarageYrBlt | GarageFinish | GarageCars | GarageArea | GarageQual | GarageCond | PavedDrive | WoodDeckSF | OpenPorchSF | EnclosedPorch | 3SsnPorch | ScreenPorch | PoolArea | PoolQC | Fence | MiscFeature | MiscVal | MoSold | YrSold | SaleType | SaleCondition | SalePrice | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 1 | 60 | RL | 65.0 | 8450 | Pave | NAA | Reg | Lvl | AllPub | Inside | Gtl | CollgCr | Norm | Norm | 1Fam | 2Story | 7 | 5 | 2003 | 2003 | Gable | CompShg | VinylSd | VinylSd | BrkFace | 196.0 | Gd | TA | PConc | Gd | TA | No | GLQ | 706 | Unf | 0 | 150 | 856 | GasA | Ex | Y | SBrkr | 856 | 854 | 0 | 1710 | 1 | 0 | 2 | 1 | 3 | 1 | Gd | 8 | Typ | 0 | NaN | Attchd | 2003.0 | RFn | 2 | 548 | TA | TA | Y | 0 | 61 | 0 | 0 | 0 | 0 | NaN | NaN | NaN | 0 | 2 | 2008 | WD | Normal | 208500 |
| 1 | 2 | 20 | RL | 80.0 | 9600 | Pave | NAA | Reg | Lvl | AllPub | FR2 | Gtl | Veenker | Feedr | Norm | 1Fam | 1Story | 6 | 8 | 1976 | 1976 | Gable | CompShg | MetalSd | MetalSd | None | 0.0 | TA | TA | CBlock | Gd | TA | Gd | ALQ | 978 | Unf | 0 | 284 | 1262 | GasA | Ex | Y | SBrkr | 1262 | 0 | 0 | 1262 | 0 | 1 | 2 | 0 | 3 | 1 | TA | 6 | Typ | 1 | TA | Attchd | 1976.0 | RFn | 2 | 460 | TA | TA | Y | 298 | 0 | 0 | 0 | 0 | 0 | NaN | NaN | NaN | 0 | 5 | 2007 | WD | Normal | 181500 |
| 2 | 3 | 60 | RL | 68.0 | 11250 | Pave | NAA | IR1 | Lvl | AllPub | Inside | Gtl | CollgCr | Norm | Norm | 1Fam | 2Story | 7 | 5 | 2001 | 2002 | Gable | CompShg | VinylSd | VinylSd | BrkFace | 162.0 | Gd | TA | PConc | Gd | TA | Mn | GLQ | 486 | Unf | 0 | 434 | 920 | GasA | Ex | Y | SBrkr | 920 | 866 | 0 | 1786 | 1 | 0 | 2 | 1 | 3 | 1 | Gd | 6 | Typ | 1 | TA | Attchd | 2001.0 | RFn | 2 | 608 | TA | TA | Y | 0 | 42 | 0 | 0 | 0 | 0 | NaN | NaN | NaN | 0 | 9 | 2008 | WD | Normal | 223500 |
| 3 | 4 | 70 | RL | 60.0 | 9550 | Pave | NAA | IR1 | Lvl | AllPub | Corner | Gtl | Crawfor | Norm | Norm | 1Fam | 2Story | 7 | 5 | 1915 | 1970 | Gable | CompShg | Wd Sdng | Wd Shng | None | 0.0 | TA | TA | BrkTil | TA | Gd | No | ALQ | 216 | Unf | 0 | 540 | 756 | GasA | Gd | Y | SBrkr | 961 | 756 | 0 | 1717 | 1 | 0 | 1 | 0 | 3 | 1 | Gd | 7 | Typ | 1 | Gd | Detchd | 1998.0 | Unf | 3 | 642 | TA | TA | Y | 0 | 35 | 272 | 0 | 0 | 0 | NaN | NaN | NaN | 0 | 2 | 2006 | WD | Abnorml | 140000 |
| 4 | 5 | 60 | RL | 84.0 | 14260 | Pave | NAA | IR1 | Lvl | AllPub | FR2 | Gtl | NoRidge | Norm | Norm | 1Fam | 2Story | 8 | 5 | 2000 | 2000 | Gable | CompShg | VinylSd | VinylSd | BrkFace | 350.0 | Gd | TA | PConc | Gd | TA | Av | GLQ | 655 | Unf | 0 | 490 | 1145 | GasA | Ex | Y | SBrkr | 1145 | 1053 | 0 | 2198 | 1 | 0 | 2 | 1 | 4 | 1 | Gd | 9 | Typ | 1 | TA | Attchd | 2000.0 | RFn | 3 | 836 | TA | TA | Y | 192 | 84 | 0 | 0 | 0 | 0 | NaN | NaN | NaN | 0 | 12 | 2008 | WD | Normal | 250000 |
## Fill NAs with NB value for BsmtQual column of train data.
#data.BsmtQual.replace(to_replace = np.nan, value ='NB', inplace=True)
data.BsmtQual.fillna('NB',inplace=True)
## Fill NAs with NB value for BsmtQual column of test data.
test_data.BsmtQual.fillna('NB',inplace=True)
## Fill NAs with NB value for BsmtCond column of train data.
#data.BsmtCond.replace(to_replace = np.nan, value ='NB', inplace=True)
data.BsmtCond.fillna('NB',inplace=True)
## Fill NAs with NB value for BsmtCond column of test data.
test_data.BsmtCond.fillna('NB',inplace=True)
## Fill NAs with NB value for BsmtExposure column of train data.
#data.BsmtExposure.replace(to_replace = np.nan, value ='NB', inplace=True)
data.BsmtExposure.fillna('NB',inplace=True)
## Fill NAs with NB value for BsmtExposure column of test data.
test_data.BsmtExposure.fillna('NB',inplace=True)
## Fill NAs with NB value for BsmtFinType1 column of train data.
#data.BsmtFinType1.replace(to_replace = np.nan, value ='NB', inplace=True)
data.BsmtFinType1.fillna('NB',inplace=True)
## Fill NAs with NB value for BsmtFinType1 column of test data.
test_data.BsmtFinType1.fillna('NB',inplace=True)
## Fill NAs with NB value for BsmtFinType2 column of train data.
#data.BsmtFinType2.replace(to_replace = np.nan, value ='NB', inplace=True)
data.BsmtFinType2.fillna('NB',inplace=True)
## Fill NAs with NB value for BsmtFinType2 column of test data.
test_data.BsmtFinType2.fillna('NB',inplace=True)
## Fill NAs with NF value for FireplaceQu column of train data.
#data.FireplaceQu.replace(to_replace = np.nan, value ='NF', inplace=True)
data.FireplaceQu.fillna('NF',inplace=True)
## Fill NAs with NF value for FireplaceQu column of test data.
test_data.FireplaceQu.fillna('NF',inplace=True)
## Fill NAs with NG value for GarageType column of train data.
data.GarageType.fillna('NG',inplace=True)
## Fill NAs with NG value for GarageType column of test data.
test_data.GarageType.fillna('NG',inplace=True)
## Fill NAs with NG value for GarageFinish column of train data.
data.GarageFinish.fillna('NG',inplace=True)
## Fill NAs with NG value for GarageFinish column of test data.
test_data.GarageFinish.fillna('NG',inplace=True)
## Fill NAs with NG value for GarageQual column of train data.
data.GarageQual.fillna('NG',inplace=True)
## Fill NAs with NG value for GarageQual column of test data.
test_data.GarageQual.fillna('NG',inplace=True)
## Fill NAs with NG value for GarageCond column of train data.
data.GarageCond.fillna('NG',inplace=True)
## Fill NAs with NG value for GarageCond column of test data.
test_data.GarageCond.fillna('NG',inplace=True)
## Fill NAs with NP value for PoolQC column of train data.
data.PoolQC.fillna('NP',inplace=True)
## Fill NAs with NP value for PoolQC column of test data.
test_data.PoolQC.fillna('NP',inplace=True)
## Fill NAs with NF value for Fence column of train data.
data.Fence.fillna('NF',inplace=True)
## Fill NAs with NF value for Fence column of test data.
test_data.Fence.fillna('NF',inplace=True)
## Fill NAs with NE value for MiscFeature column of train data.
data.MiscFeature.fillna('NE',inplace=True)
## Fill NAs with NE value for MiscFeature column of test data.
test_data.MiscFeature.fillna('NE',inplace=True)
## Display first record of train data.
data[:1]
| Id | MSSubClass | MSZoning | LotFrontage | LotArea | Street | Alley | LotShape | LandContour | Utilities | LotConfig | LandSlope | Neighborhood | Condition1 | Condition2 | BldgType | HouseStyle | OverallQual | OverallCond | YearBuilt | YearRemodAdd | RoofStyle | RoofMatl | Exterior1st | Exterior2nd | MasVnrType | MasVnrArea | ExterQual | ExterCond | Foundation | BsmtQual | BsmtCond | BsmtExposure | BsmtFinType1 | BsmtFinSF1 | BsmtFinType2 | BsmtFinSF2 | BsmtUnfSF | TotalBsmtSF | Heating | HeatingQC | CentralAir | Electrical | 1stFlrSF | 2ndFlrSF | LowQualFinSF | GrLivArea | BsmtFullBath | BsmtHalfBath | FullBath | HalfBath | BedroomAbvGr | KitchenAbvGr | KitchenQual | TotRmsAbvGrd | Functional | Fireplaces | FireplaceQu | GarageType | GarageYrBlt | GarageFinish | GarageCars | GarageArea | GarageQual | GarageCond | PavedDrive | WoodDeckSF | OpenPorchSF | EnclosedPorch | 3SsnPorch | ScreenPorch | PoolArea | PoolQC | Fence | MiscFeature | MiscVal | MoSold | YrSold | SaleType | SaleCondition | SalePrice | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 1 | 60 | RL | 65.0 | 8450 | Pave | NAA | Reg | Lvl | AllPub | Inside | Gtl | CollgCr | Norm | Norm | 1Fam | 2Story | 7 | 5 | 2003 | 2003 | Gable | CompShg | VinylSd | VinylSd | BrkFace | 196.0 | Gd | TA | PConc | Gd | TA | No | GLQ | 706 | Unf | 0 | 150 | 856 | GasA | Ex | Y | SBrkr | 856 | 854 | 0 | 1710 | 1 | 0 | 2 | 1 | 3 | 1 | Gd | 8 | Typ | 0 | NF | Attchd | 2003.0 | RFn | 2 | 548 | TA | TA | Y | 0 | 61 | 0 | 0 | 0 | 0 | NP | NF | NE | 0 | 2 | 2008 | WD | Normal | 208500 |
## Display first record of test data.
test_data[:1]
| Id | MSSubClass | MSZoning | LotFrontage | LotArea | Street | Alley | LotShape | LandContour | Utilities | LotConfig | LandSlope | Neighborhood | Condition1 | Condition2 | BldgType | HouseStyle | OverallQual | OverallCond | YearBuilt | YearRemodAdd | RoofStyle | RoofMatl | Exterior1st | Exterior2nd | MasVnrType | MasVnrArea | ExterQual | ExterCond | Foundation | BsmtQual | BsmtCond | BsmtExposure | BsmtFinType1 | BsmtFinSF1 | BsmtFinType2 | BsmtFinSF2 | BsmtUnfSF | TotalBsmtSF | Heating | HeatingQC | CentralAir | Electrical | 1stFlrSF | 2ndFlrSF | LowQualFinSF | GrLivArea | BsmtFullBath | BsmtHalfBath | FullBath | HalfBath | BedroomAbvGr | KitchenAbvGr | KitchenQual | TotRmsAbvGrd | Functional | Fireplaces | FireplaceQu | GarageType | GarageYrBlt | GarageFinish | GarageCars | GarageArea | GarageQual | GarageCond | PavedDrive | WoodDeckSF | OpenPorchSF | EnclosedPorch | 3SsnPorch | ScreenPorch | PoolArea | PoolQC | Fence | MiscFeature | MiscVal | MoSold | YrSold | SaleType | SaleCondition | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 1461 | 20 | RH | 80.0 | 11622 | Pave | NAA | Reg | Lvl | AllPub | Inside | Gtl | NAmes | Feedr | Norm | 1Fam | 1Story | 5 | 6 | 1961 | 1961 | Gable | CompShg | VinylSd | VinylSd | None | 0.0 | TA | TA | CBlock | TA | TA | No | Rec | 468.0 | LwQ | 144.0 | 270.0 | 882.0 | GasA | TA | Y | SBrkr | 896 | 0 | 0 | 896 | 0.0 | 0.0 | 1 | 0 | 2 | 1 | TA | 5 | Typ | 0 | NF | Attchd | 1961.0 | Unf | 1.0 | 730.0 | TA | TA | Y | 140 | 0 | 0 | 0 | 120 | 0 | NP | MnPrv | NE | 0 | 6 | 2010 | WD | Normal |
## Get summary statistics of train data.
Observations(data)
| dtypes | levels | null_values | Unique Values | |
|---|---|---|---|---|
| Id | int64 | [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14... | 0 | 1460 |
| MSSubClass | int64 | [60, 20, 70, 50, 190, 45, 90, 120, 30, 85, 80,... | 0 | 15 |
| MSZoning | object | [RL, RM, C (all), FV, RH] | 0 | 5 |
| LotFrontage | float64 | [65.0, 80.0, 68.0, 60.0, 84.0, 85.0, 75.0, nan... | 259 | 110 |
| LotArea | int64 | [8450, 9600, 11250, 9550, 14260, 14115, 10084,... | 0 | 1073 |
| Street | object | [Pave, Grvl] | 0 | 2 |
| Alley | object | [NAA, Grvl, Pave] | 0 | 3 |
| LotShape | object | [Reg, IR1, IR2, IR3] | 0 | 4 |
| LandContour | object | [Lvl, Bnk, Low, HLS] | 0 | 4 |
| Utilities | object | [AllPub, NoSeWa] | 0 | 2 |
| LotConfig | object | [Inside, FR2, Corner, CulDSac, FR3] | 0 | 5 |
| LandSlope | object | [Gtl, Mod, Sev] | 0 | 3 |
| Neighborhood | object | [CollgCr, Veenker, Crawfor, NoRidge, Mitchel, ... | 0 | 25 |
| Condition1 | object | [Norm, Feedr, PosN, Artery, RRAe, RRNn, RRAn, ... | 0 | 9 |
| Condition2 | object | [Norm, Artery, RRNn, Feedr, PosN, PosA, RRAn, ... | 0 | 8 |
| BldgType | object | [1Fam, 2fmCon, Duplex, TwnhsE, Twnhs] | 0 | 5 |
| HouseStyle | object | [2Story, 1Story, 1.5Fin, 1.5Unf, SFoyer, SLvl,... | 0 | 8 |
| OverallQual | int64 | [7, 6, 8, 5, 9, 4, 10, 3, 1, 2] | 0 | 10 |
| OverallCond | int64 | [5, 8, 6, 7, 4, 2, 3, 9, 1] | 0 | 9 |
| YearBuilt | int64 | [2003, 1976, 2001, 1915, 2000, 1993, 2004, 197... | 0 | 112 |
| YearRemodAdd | int64 | [2003, 1976, 2002, 1970, 2000, 1995, 2005, 197... | 0 | 61 |
| RoofStyle | object | [Gable, Hip, Gambrel, Mansard, Flat, Shed] | 0 | 6 |
| RoofMatl | object | [CompShg, WdShngl, Metal, WdShake, Membran, Ta... | 0 | 8 |
| Exterior1st | object | [VinylSd, MetalSd, Wd Sdng, HdBoard, BrkFace, ... | 0 | 15 |
| Exterior2nd | object | [VinylSd, MetalSd, Wd Shng, HdBoard, Plywood, ... | 0 | 16 |
| MasVnrType | object | [BrkFace, None, Stone, BrkCmn, nan] | 8 | 4 |
| MasVnrArea | float64 | [196.0, 0.0, 162.0, 350.0, 186.0, 240.0, 286.0... | 8 | 327 |
| ExterQual | object | [Gd, TA, Ex, Fa] | 0 | 4 |
| ExterCond | object | [TA, Gd, Fa, Po, Ex] | 0 | 5 |
| Foundation | object | [PConc, CBlock, BrkTil, Wood, Slab, Stone] | 0 | 6 |
| BsmtQual | object | [Gd, TA, Ex, NB, Fa] | 0 | 5 |
| BsmtCond | object | [TA, Gd, NB, Fa, Po] | 0 | 5 |
| BsmtExposure | object | [No, Gd, Mn, Av, NB] | 0 | 5 |
| BsmtFinType1 | object | [GLQ, ALQ, Unf, Rec, BLQ, NB, LwQ] | 0 | 7 |
| BsmtFinSF1 | int64 | [706, 978, 486, 216, 655, 732, 1369, 859, 0, 8... | 0 | 637 |
| BsmtFinType2 | object | [Unf, BLQ, NB, ALQ, Rec, LwQ, GLQ] | 0 | 7 |
| BsmtFinSF2 | int64 | [0, 32, 668, 486, 93, 491, 506, 712, 362, 41, ... | 0 | 144 |
| BsmtUnfSF | int64 | [150, 284, 434, 540, 490, 64, 317, 216, 952, 1... | 0 | 780 |
| TotalBsmtSF | int64 | [856, 1262, 920, 756, 1145, 796, 1686, 1107, 9... | 0 | 721 |
| Heating | object | [GasA, GasW, Grav, Wall, OthW, Floor] | 0 | 6 |
| HeatingQC | object | [Ex, Gd, TA, Fa, Po] | 0 | 5 |
| CentralAir | object | [Y, N] | 0 | 2 |
| Electrical | object | [SBrkr, FuseF, FuseA, FuseP, Mix, nan] | 1 | 5 |
| 1stFlrSF | int64 | [856, 1262, 920, 961, 1145, 796, 1694, 1107, 1... | 0 | 753 |
| 2ndFlrSF | int64 | [854, 0, 866, 756, 1053, 566, 983, 752, 1142, ... | 0 | 417 |
| LowQualFinSF | int64 | [0, 360, 513, 234, 528, 572, 144, 392, 371, 39... | 0 | 24 |
| GrLivArea | int64 | [1710, 1262, 1786, 1717, 2198, 1362, 1694, 209... | 0 | 861 |
| BsmtFullBath | int64 | [1, 0, 2, 3] | 0 | 4 |
| BsmtHalfBath | int64 | [0, 1, 2] | 0 | 3 |
| FullBath | int64 | [2, 1, 3, 0] | 0 | 4 |
| HalfBath | int64 | [1, 0, 2] | 0 | 3 |
| BedroomAbvGr | int64 | [3, 4, 1, 2, 0, 5, 6, 8] | 0 | 8 |
| KitchenAbvGr | int64 | [1, 2, 3, 0] | 0 | 4 |
| KitchenQual | object | [Gd, TA, Ex, Fa] | 0 | 4 |
| TotRmsAbvGrd | int64 | [8, 6, 7, 9, 5, 11, 4, 10, 12, 3, 2, 14] | 0 | 12 |
| Functional | object | [Typ, Min1, Maj1, Min2, Mod, Maj2, Sev] | 0 | 7 |
| Fireplaces | int64 | [0, 1, 2, 3] | 0 | 4 |
| FireplaceQu | object | [NF, TA, Gd, Fa, Ex, Po] | 0 | 6 |
| GarageType | object | [Attchd, Detchd, BuiltIn, CarPort, NG, Basment... | 0 | 7 |
| GarageYrBlt | float64 | [2003.0, 1976.0, 2001.0, 1998.0, 2000.0, 1993.... | 81 | 97 |
| GarageFinish | object | [RFn, Unf, Fin, NG] | 0 | 4 |
| GarageCars | int64 | [2, 3, 1, 0, 4] | 0 | 5 |
| GarageArea | int64 | [548, 460, 608, 642, 836, 480, 636, 484, 468, ... | 0 | 441 |
| GarageQual | object | [TA, Fa, Gd, NG, Ex, Po] | 0 | 6 |
| GarageCond | object | [TA, Fa, NG, Gd, Po, Ex] | 0 | 6 |
| PavedDrive | object | [Y, N, P] | 0 | 3 |
| WoodDeckSF | int64 | [0, 298, 192, 40, 255, 235, 90, 147, 140, 160,... | 0 | 274 |
| OpenPorchSF | int64 | [61, 0, 42, 35, 84, 30, 57, 204, 4, 21, 33, 21... | 0 | 202 |
| EnclosedPorch | int64 | [0, 272, 228, 205, 176, 87, 172, 102, 37, 144,... | 0 | 120 |
| 3SsnPorch | int64 | [0, 320, 407, 130, 180, 168, 140, 508, 238, 24... | 0 | 20 |
| ScreenPorch | int64 | [0, 176, 198, 291, 252, 99, 184, 168, 130, 142... | 0 | 76 |
| PoolArea | int64 | [0, 512, 648, 576, 555, 480, 519, 738] | 0 | 8 |
| PoolQC | object | [NP, Ex, Fa, Gd] | 0 | 4 |
| Fence | object | [NF, MnPrv, GdWo, GdPrv, MnWw] | 0 | 5 |
| MiscFeature | object | [NE, Shed, Gar2, Othr, TenC] | 0 | 5 |
| MiscVal | int64 | [0, 700, 350, 500, 400, 480, 450, 15500, 1200,... | 0 | 21 |
| MoSold | int64 | [2, 5, 9, 12, 10, 8, 11, 4, 1, 7, 3, 6] | 0 | 12 |
| YrSold | int64 | [2008, 2007, 2006, 2009, 2010] | 0 | 5 |
| SaleType | object | [WD, New, COD, ConLD, ConLI, CWD, ConLw, Con, ... | 0 | 9 |
| SaleCondition | object | [Normal, Abnorml, Partial, AdjLand, Alloca, Fa... | 0 | 6 |
| SalePrice | int64 | [208500, 181500, 223500, 140000, 250000, 14300... | 0 | 663 |
## Get summary statistics of test data.
Observations(test_data)
| dtypes | levels | null_values | Unique Values | |
|---|---|---|---|---|
| Id | int64 | [1461, 1462, 1463, 1464, 1465, 1466, 1467, 146... | 0 | 1459 |
| MSSubClass | int64 | [20, 60, 120, 160, 80, 30, 50, 90, 85, 190, 45... | 0 | 16 |
| MSZoning | object | [RH, RL, RM, FV, C (all), nan] | 4 | 5 |
| LotFrontage | float64 | [80.0, 81.0, 74.0, 78.0, 43.0, 75.0, nan, 63.0... | 227 | 115 |
| LotArea | int64 | [11622, 14267, 13830, 9978, 5005, 10000, 7980,... | 0 | 1106 |
| Street | object | [Pave, Grvl] | 0 | 2 |
| Alley | object | [NAA, Pave, Grvl] | 0 | 3 |
| LotShape | object | [Reg, IR1, IR2, IR3] | 0 | 4 |
| LandContour | object | [Lvl, HLS, Bnk, Low] | 0 | 4 |
| Utilities | object | [AllPub, nan] | 2 | 1 |
| LotConfig | object | [Inside, Corner, FR2, CulDSac, FR3] | 0 | 5 |
| LandSlope | object | [Gtl, Mod, Sev] | 0 | 3 |
| Neighborhood | object | [NAmes, Gilbert, StoneBr, BrDale, NPkVill, Nri... | 0 | 25 |
| Condition1 | object | [Feedr, Norm, PosN, RRNe, Artery, RRNn, PosA, ... | 0 | 9 |
| Condition2 | object | [Norm, Feedr, PosA, PosN, Artery] | 0 | 5 |
| BldgType | object | [1Fam, TwnhsE, Twnhs, Duplex, 2fmCon] | 0 | 5 |
| HouseStyle | object | [1Story, 2Story, SLvl, 1.5Fin, SFoyer, 2.5Unf,... | 0 | 7 |
| OverallQual | int64 | [5, 6, 8, 7, 4, 9, 2, 3, 10, 1] | 0 | 10 |
| OverallCond | int64 | [6, 5, 7, 8, 2, 9, 3, 4, 1] | 0 | 9 |
| YearBuilt | int64 | [1961, 1958, 1997, 1998, 1992, 1993, 1990, 197... | 0 | 106 |
| YearRemodAdd | int64 | [1961, 1958, 1998, 1992, 1994, 2007, 1990, 197... | 0 | 61 |
| RoofStyle | object | [Gable, Hip, Gambrel, Flat, Mansard, Shed] | 0 | 6 |
| RoofMatl | object | [CompShg, Tar&Grv, WdShake, WdShngl] | 0 | 4 |
| Exterior1st | object | [VinylSd, Wd Sdng, HdBoard, Plywood, MetalSd, ... | 1 | 13 |
| Exterior2nd | object | [VinylSd, Wd Sdng, HdBoard, Plywood, MetalSd, ... | 1 | 15 |
| MasVnrType | object | [None, BrkFace, Stone, BrkCmn, nan] | 16 | 4 |
| MasVnrArea | float64 | [0.0, 108.0, 20.0, 504.0, 492.0, 162.0, 256.0,... | 15 | 303 |
| ExterQual | object | [TA, Gd, Ex, Fa] | 0 | 4 |
| ExterCond | object | [TA, Gd, Fa, Po, Ex] | 0 | 5 |
| Foundation | object | [CBlock, PConc, BrkTil, Stone, Slab, Wood] | 0 | 6 |
| BsmtQual | object | [TA, Gd, Ex, Fa, NB] | 0 | 5 |
| BsmtCond | object | [TA, Po, Fa, Gd, NB] | 0 | 5 |
| BsmtExposure | object | [No, Gd, Mn, Av, NB] | 0 | 5 |
| BsmtFinType1 | object | [Rec, ALQ, GLQ, Unf, BLQ, LwQ, NB] | 0 | 7 |
| BsmtFinSF1 | float64 | [468.0, 923.0, 791.0, 602.0, 263.0, 0.0, 935.0... | 1 | 669 |
| BsmtFinType2 | object | [LwQ, Unf, Rec, BLQ, GLQ, ALQ, NB] | 0 | 7 |
| BsmtFinSF2 | float64 | [144.0, 0.0, 78.0, 859.0, 981.0, 42.0, 46.0, 1... | 1 | 161 |
| BsmtUnfSF | float64 | [270.0, 406.0, 137.0, 324.0, 1017.0, 763.0, 23... | 1 | 793 |
| TotalBsmtSF | float64 | [882.0, 1329.0, 928.0, 926.0, 1280.0, 763.0, 1... | 1 | 736 |
| Heating | object | [GasA, GasW, Grav, Wall] | 0 | 4 |
| HeatingQC | object | [TA, Gd, Ex, Fa, Po] | 0 | 5 |
| CentralAir | object | [Y, N] | 0 | 2 |
| Electrical | object | [SBrkr, FuseA, FuseF, FuseP] | 0 | 4 |
| 1stFlrSF | int64 | [896, 1329, 928, 926, 1280, 763, 1187, 789, 13... | 0 | 789 |
| 2ndFlrSF | int64 | [0, 701, 678, 892, 676, 504, 567, 601, 707, 56... | 0 | 407 |
| LowQualFinSF | int64 | [0, 362, 1064, 431, 436, 259, 312, 108, 697, 5... | 0 | 15 |
| GrLivArea | int64 | [896, 1329, 1629, 1604, 1280, 1655, 1187, 1465... | 0 | 879 |
| BsmtFullBath | float64 | [0.0, 1.0, 2.0, 3.0, nan] | 2 | 4 |
| BsmtHalfBath | float64 | [0.0, 1.0, nan, 2.0] | 2 | 3 |
| FullBath | int64 | [1, 2, 3, 4, 0] | 0 | 5 |
| HalfBath | int64 | [0, 1, 2] | 0 | 3 |
| BedroomAbvGr | int64 | [2, 3, 4, 1, 6, 5, 0] | 0 | 7 |
| KitchenAbvGr | int64 | [1, 2, 0] | 0 | 3 |
| KitchenQual | object | [TA, Gd, Ex, Fa, nan] | 1 | 4 |
| TotRmsAbvGrd | int64 | [5, 6, 7, 4, 10, 8, 9, 3, 12, 11, 13, 15] | 0 | 12 |
| Functional | object | [Typ, Min2, Min1, Mod, Maj1, Sev, Maj2, nan] | 2 | 7 |
| Fireplaces | int64 | [0, 1, 2, 3, 4] | 0 | 5 |
| FireplaceQu | object | [NF, TA, Gd, Po, Fa, Ex] | 0 | 6 |
| GarageType | object | [Attchd, Detchd, BuiltIn, NG, Basment, 2Types,... | 0 | 7 |
| GarageYrBlt | float64 | [1961.0, 1958.0, 1997.0, 1998.0, 1992.0, 1993.... | 78 | 97 |
| GarageFinish | object | [Unf, Fin, RFn, NG] | 0 | 4 |
| GarageCars | float64 | [1.0, 2.0, 3.0, 0.0, 4.0, 5.0, nan] | 1 | 6 |
| GarageArea | float64 | [730.0, 312.0, 482.0, 470.0, 506.0, 440.0, 420... | 1 | 459 |
| GarageQual | object | [TA, NG, Fa, Gd, Po] | 0 | 5 |
| GarageCond | object | [TA, NG, Fa, Gd, Po, Ex] | 0 | 6 |
| PavedDrive | object | [Y, N, P] | 0 | 3 |
| WoodDeckSF | int64 | [140, 393, 212, 360, 0, 157, 483, 192, 240, 20... | 0 | 263 |
| OpenPorchSF | int64 | [0, 36, 34, 82, 84, 21, 75, 68, 30, 133, 35, 7... | 0 | 203 |
| EnclosedPorch | int64 | [0, 80, 186, 120, 150, 205, 113, 135, 126, 334... | 0 | 131 |
| 3SsnPorch | int64 | [0, 224, 255, 225, 360, 150, 153, 174, 120, 21... | 0 | 13 |
| ScreenPorch | int64 | [120, 0, 144, 256, 216, 204, 160, 240, 148, 16... | 0 | 75 |
| PoolArea | int64 | [0, 144, 368, 444, 228, 561, 800] | 0 | 7 |
| PoolQC | object | [NP, Ex, Gd] | 0 | 3 |
| Fence | object | [MnPrv, NF, GdPrv, GdWo, MnWw] | 0 | 5 |
| MiscFeature | object | [NE, Gar2, Shed, Othr] | 0 | 4 |
| MiscVal | int64 | [0, 12500, 500, 1500, 300, 450, 80, 600, 490, ... | 0 | 26 |
| MoSold | int64 | [6, 3, 1, 4, 5, 2, 7, 10, 8, 11, 9, 12] | 0 | 12 |
| YrSold | int64 | [2010, 2009, 2008, 2007, 2006] | 0 | 5 |
| SaleType | object | [WD, COD, New, ConLD, Oth, Con, ConLw, ConLI, ... | 1 | 9 |
| SaleCondition | object | [Normal, Partial, Abnorml, Family, Alloca, Adj... | 0 | 6 |
## Get data type of MSZoning column.
data.MSZoning.dtypes
dtype('O')
## Fetch category,object data types columns from train data.
object_columns = data.select_dtypes(include=['object','category'])
## Fetch category,object data types columns from test data.
test_object_columns = test_data.select_dtypes(include=['object','category'])
## Display first records of category & object columns of train data.
object_columns.head()
| MSZoning | Street | Alley | LotShape | LandContour | Utilities | LotConfig | LandSlope | Neighborhood | Condition1 | Condition2 | BldgType | HouseStyle | RoofStyle | RoofMatl | Exterior1st | Exterior2nd | MasVnrType | ExterQual | ExterCond | Foundation | BsmtQual | BsmtCond | BsmtExposure | BsmtFinType1 | BsmtFinType2 | Heating | HeatingQC | CentralAir | Electrical | KitchenQual | Functional | FireplaceQu | GarageType | GarageFinish | GarageQual | GarageCond | PavedDrive | PoolQC | Fence | MiscFeature | SaleType | SaleCondition | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | RL | Pave | NAA | Reg | Lvl | AllPub | Inside | Gtl | CollgCr | Norm | Norm | 1Fam | 2Story | Gable | CompShg | VinylSd | VinylSd | BrkFace | Gd | TA | PConc | Gd | TA | No | GLQ | Unf | GasA | Ex | Y | SBrkr | Gd | Typ | NF | Attchd | RFn | TA | TA | Y | NP | NF | NE | WD | Normal |
| 1 | RL | Pave | NAA | Reg | Lvl | AllPub | FR2 | Gtl | Veenker | Feedr | Norm | 1Fam | 1Story | Gable | CompShg | MetalSd | MetalSd | None | TA | TA | CBlock | Gd | TA | Gd | ALQ | Unf | GasA | Ex | Y | SBrkr | TA | Typ | TA | Attchd | RFn | TA | TA | Y | NP | NF | NE | WD | Normal |
| 2 | RL | Pave | NAA | IR1 | Lvl | AllPub | Inside | Gtl | CollgCr | Norm | Norm | 1Fam | 2Story | Gable | CompShg | VinylSd | VinylSd | BrkFace | Gd | TA | PConc | Gd | TA | Mn | GLQ | Unf | GasA | Ex | Y | SBrkr | Gd | Typ | TA | Attchd | RFn | TA | TA | Y | NP | NF | NE | WD | Normal |
| 3 | RL | Pave | NAA | IR1 | Lvl | AllPub | Corner | Gtl | Crawfor | Norm | Norm | 1Fam | 2Story | Gable | CompShg | Wd Sdng | Wd Shng | None | TA | TA | BrkTil | TA | Gd | No | ALQ | Unf | GasA | Gd | Y | SBrkr | Gd | Typ | Gd | Detchd | Unf | TA | TA | Y | NP | NF | NE | WD | Abnorml |
| 4 | RL | Pave | NAA | IR1 | Lvl | AllPub | FR2 | Gtl | NoRidge | Norm | Norm | 1Fam | 2Story | Gable | CompShg | VinylSd | VinylSd | BrkFace | Gd | TA | PConc | Gd | TA | Av | GLQ | Unf | GasA | Ex | Y | SBrkr | Gd | Typ | TA | Attchd | RFn | TA | TA | Y | NP | NF | NE | WD | Normal |
## Display first records of category & object columns of test data.
test_object_columns.head()
| MSZoning | Street | Alley | LotShape | LandContour | Utilities | LotConfig | LandSlope | Neighborhood | Condition1 | Condition2 | BldgType | HouseStyle | RoofStyle | RoofMatl | Exterior1st | Exterior2nd | MasVnrType | ExterQual | ExterCond | Foundation | BsmtQual | BsmtCond | BsmtExposure | BsmtFinType1 | BsmtFinType2 | Heating | HeatingQC | CentralAir | Electrical | KitchenQual | Functional | FireplaceQu | GarageType | GarageFinish | GarageQual | GarageCond | PavedDrive | PoolQC | Fence | MiscFeature | SaleType | SaleCondition | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | RH | Pave | NAA | Reg | Lvl | AllPub | Inside | Gtl | NAmes | Feedr | Norm | 1Fam | 1Story | Gable | CompShg | VinylSd | VinylSd | None | TA | TA | CBlock | TA | TA | No | Rec | LwQ | GasA | TA | Y | SBrkr | TA | Typ | NF | Attchd | Unf | TA | TA | Y | NP | MnPrv | NE | WD | Normal |
| 1 | RL | Pave | NAA | IR1 | Lvl | AllPub | Corner | Gtl | NAmes | Norm | Norm | 1Fam | 1Story | Hip | CompShg | Wd Sdng | Wd Sdng | BrkFace | TA | TA | CBlock | TA | TA | No | ALQ | Unf | GasA | TA | Y | SBrkr | Gd | Typ | NF | Attchd | Unf | TA | TA | Y | NP | NF | Gar2 | WD | Normal |
| 2 | RL | Pave | NAA | IR1 | Lvl | AllPub | Inside | Gtl | Gilbert | Norm | Norm | 1Fam | 2Story | Gable | CompShg | VinylSd | VinylSd | None | TA | TA | PConc | Gd | TA | No | GLQ | Unf | GasA | Gd | Y | SBrkr | TA | Typ | TA | Attchd | Fin | TA | TA | Y | NP | MnPrv | NE | WD | Normal |
| 3 | RL | Pave | NAA | IR1 | Lvl | AllPub | Inside | Gtl | Gilbert | Norm | Norm | 1Fam | 2Story | Gable | CompShg | VinylSd | VinylSd | BrkFace | TA | TA | PConc | TA | TA | No | GLQ | Unf | GasA | Ex | Y | SBrkr | Gd | Typ | Gd | Attchd | Fin | TA | TA | Y | NP | NF | NE | WD | Normal |
| 4 | RL | Pave | NAA | IR1 | HLS | AllPub | Inside | Gtl | StoneBr | Norm | Norm | TwnhsE | 1Story | Gable | CompShg | HdBoard | HdBoard | None | Gd | TA | PConc | Gd | TA | No | ALQ | Unf | GasA | Ex | Y | SBrkr | Gd | Typ | NF | Attchd | RFn | TA | TA | Y | NP | NF | NE | WD | Normal |
## Convert object to category data type.
for col in object_columns.columns:
data[col] = data[col].astype('str').astype('category')
## Get columns data types of train data.
data.dtypes
Id int64 MSSubClass int64 MSZoning category LotFrontage float64 LotArea int64 Street category Alley category LotShape category LandContour category Utilities category LotConfig category LandSlope category Neighborhood category Condition1 category Condition2 category BldgType category HouseStyle category OverallQual int64 OverallCond int64 YearBuilt int64 YearRemodAdd int64 RoofStyle category RoofMatl category Exterior1st category Exterior2nd category MasVnrType category MasVnrArea float64 ExterQual category ExterCond category Foundation category BsmtQual category BsmtCond category BsmtExposure category BsmtFinType1 category BsmtFinSF1 int64 BsmtFinType2 category BsmtFinSF2 int64 BsmtUnfSF int64 TotalBsmtSF int64 Heating category HeatingQC category CentralAir category Electrical category 1stFlrSF int64 2ndFlrSF int64 LowQualFinSF int64 GrLivArea int64 BsmtFullBath int64 BsmtHalfBath int64 FullBath int64 HalfBath int64 BedroomAbvGr int64 KitchenAbvGr int64 KitchenQual category TotRmsAbvGrd int64 Functional category Fireplaces int64 FireplaceQu category GarageType category GarageYrBlt float64 GarageFinish category GarageCars int64 GarageArea int64 GarageQual category GarageCond category PavedDrive category WoodDeckSF int64 OpenPorchSF int64 EnclosedPorch int64 3SsnPorch int64 ScreenPorch int64 PoolArea int64 PoolQC category Fence category MiscFeature category MiscVal int64 MoSold int64 YrSold int64 SaleType category SaleCondition category SalePrice int64 dtype: object
## Convert object data typpe to category.
for col in test_object_columns.columns:
test_data[col] = test_data[col].astype('str').astype('category')
## Get column data types of test data.
test_data.dtypes
Id int64 MSSubClass int64 MSZoning category LotFrontage float64 LotArea int64 Street category Alley category LotShape category LandContour category Utilities category LotConfig category LandSlope category Neighborhood category Condition1 category Condition2 category BldgType category HouseStyle category OverallQual int64 OverallCond int64 YearBuilt int64 YearRemodAdd int64 RoofStyle category RoofMatl category Exterior1st category Exterior2nd category MasVnrType category MasVnrArea float64 ExterQual category ExterCond category Foundation category BsmtQual category BsmtCond category BsmtExposure category BsmtFinType1 category BsmtFinSF1 float64 BsmtFinType2 category BsmtFinSF2 float64 BsmtUnfSF float64 TotalBsmtSF float64 Heating category HeatingQC category CentralAir category Electrical category 1stFlrSF int64 2ndFlrSF int64 LowQualFinSF int64 GrLivArea int64 BsmtFullBath float64 BsmtHalfBath float64 FullBath int64 HalfBath int64 BedroomAbvGr int64 KitchenAbvGr int64 KitchenQual category TotRmsAbvGrd int64 Functional category Fireplaces int64 FireplaceQu category GarageType category GarageYrBlt float64 GarageFinish category GarageCars float64 GarageArea float64 GarageQual category GarageCond category PavedDrive category WoodDeckSF int64 OpenPorchSF int64 EnclosedPorch int64 3SsnPorch int64 ScreenPorch int64 PoolArea int64 PoolQC category Fence category MiscFeature category MiscVal int64 MoSold int64 YrSold int64 SaleType category SaleCondition category dtype: object
## Convert numeric columns into categorical varibles
cols = ['MSSubClass','OverallQual','OverallCond']
for col in cols:
data[col] = data[col].astype('str').astype('category')
## Convert numeric data types to categorical data types.
for col in cols:
test_data[col] = test_data[col].astype('str').astype('category')
## Get columns data types of train data.
data.dtypes
Id int64 MSSubClass category MSZoning category LotFrontage float64 LotArea int64 Street category Alley category LotShape category LandContour category Utilities category LotConfig category LandSlope category Neighborhood category Condition1 category Condition2 category BldgType category HouseStyle category OverallQual category OverallCond category YearBuilt int64 YearRemodAdd int64 RoofStyle category RoofMatl category Exterior1st category Exterior2nd category MasVnrType category MasVnrArea float64 ExterQual category ExterCond category Foundation category BsmtQual category BsmtCond category BsmtExposure category BsmtFinType1 category BsmtFinSF1 int64 BsmtFinType2 category BsmtFinSF2 int64 BsmtUnfSF int64 TotalBsmtSF int64 Heating category HeatingQC category CentralAir category Electrical category 1stFlrSF int64 2ndFlrSF int64 LowQualFinSF int64 GrLivArea int64 BsmtFullBath int64 BsmtHalfBath int64 FullBath int64 HalfBath int64 BedroomAbvGr int64 KitchenAbvGr int64 KitchenQual category TotRmsAbvGrd int64 Functional category Fireplaces int64 FireplaceQu category GarageType category GarageYrBlt float64 GarageFinish category GarageCars int64 GarageArea int64 GarageQual category GarageCond category PavedDrive category WoodDeckSF int64 OpenPorchSF int64 EnclosedPorch int64 3SsnPorch int64 ScreenPorch int64 PoolArea int64 PoolQC category Fence category MiscFeature category MiscVal int64 MoSold int64 YrSold int64 SaleType category SaleCondition category SalePrice int64 dtype: object
## Get columns data types of test data.
test_data.dtypes
Id int64 MSSubClass category MSZoning category LotFrontage float64 LotArea int64 Street category Alley category LotShape category LandContour category Utilities category LotConfig category LandSlope category Neighborhood category Condition1 category Condition2 category BldgType category HouseStyle category OverallQual category OverallCond category YearBuilt int64 YearRemodAdd int64 RoofStyle category RoofMatl category Exterior1st category Exterior2nd category MasVnrType category MasVnrArea float64 ExterQual category ExterCond category Foundation category BsmtQual category BsmtCond category BsmtExposure category BsmtFinType1 category BsmtFinSF1 float64 BsmtFinType2 category BsmtFinSF2 float64 BsmtUnfSF float64 TotalBsmtSF float64 Heating category HeatingQC category CentralAir category Electrical category 1stFlrSF int64 2ndFlrSF int64 LowQualFinSF int64 GrLivArea int64 BsmtFullBath float64 BsmtHalfBath float64 FullBath int64 HalfBath int64 BedroomAbvGr int64 KitchenAbvGr int64 KitchenQual category TotRmsAbvGrd int64 Functional category Fireplaces int64 FireplaceQu category GarageType category GarageYrBlt float64 GarageFinish category GarageCars float64 GarageArea float64 GarageQual category GarageCond category PavedDrive category WoodDeckSF int64 OpenPorchSF int64 EnclosedPorch int64 3SsnPorch int64 ScreenPorch int64 PoolArea int64 PoolQC category Fence category MiscFeature category MiscVal int64 MoSold int64 YrSold int64 SaleType category SaleCondition category dtype: object
## Set index for train data and display first 5 records.
data = data.set_index('Id')
data.head()
| MSSubClass | MSZoning | LotFrontage | LotArea | Street | Alley | LotShape | LandContour | Utilities | LotConfig | LandSlope | Neighborhood | Condition1 | Condition2 | BldgType | HouseStyle | OverallQual | OverallCond | YearBuilt | YearRemodAdd | RoofStyle | RoofMatl | Exterior1st | Exterior2nd | MasVnrType | MasVnrArea | ExterQual | ExterCond | Foundation | BsmtQual | BsmtCond | BsmtExposure | BsmtFinType1 | BsmtFinSF1 | BsmtFinType2 | BsmtFinSF2 | BsmtUnfSF | TotalBsmtSF | Heating | HeatingQC | CentralAir | Electrical | 1stFlrSF | 2ndFlrSF | LowQualFinSF | GrLivArea | BsmtFullBath | BsmtHalfBath | FullBath | HalfBath | BedroomAbvGr | KitchenAbvGr | KitchenQual | TotRmsAbvGrd | Functional | Fireplaces | FireplaceQu | GarageType | GarageYrBlt | GarageFinish | GarageCars | GarageArea | GarageQual | GarageCond | PavedDrive | WoodDeckSF | OpenPorchSF | EnclosedPorch | 3SsnPorch | ScreenPorch | PoolArea | PoolQC | Fence | MiscFeature | MiscVal | MoSold | YrSold | SaleType | SaleCondition | SalePrice | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Id | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| 1 | 60 | RL | 65.0 | 8450 | Pave | NAA | Reg | Lvl | AllPub | Inside | Gtl | CollgCr | Norm | Norm | 1Fam | 2Story | 7 | 5 | 2003 | 2003 | Gable | CompShg | VinylSd | VinylSd | BrkFace | 196.0 | Gd | TA | PConc | Gd | TA | No | GLQ | 706 | Unf | 0 | 150 | 856 | GasA | Ex | Y | SBrkr | 856 | 854 | 0 | 1710 | 1 | 0 | 2 | 1 | 3 | 1 | Gd | 8 | Typ | 0 | NF | Attchd | 2003.0 | RFn | 2 | 548 | TA | TA | Y | 0 | 61 | 0 | 0 | 0 | 0 | NP | NF | NE | 0 | 2 | 2008 | WD | Normal | 208500 |
| 2 | 20 | RL | 80.0 | 9600 | Pave | NAA | Reg | Lvl | AllPub | FR2 | Gtl | Veenker | Feedr | Norm | 1Fam | 1Story | 6 | 8 | 1976 | 1976 | Gable | CompShg | MetalSd | MetalSd | None | 0.0 | TA | TA | CBlock | Gd | TA | Gd | ALQ | 978 | Unf | 0 | 284 | 1262 | GasA | Ex | Y | SBrkr | 1262 | 0 | 0 | 1262 | 0 | 1 | 2 | 0 | 3 | 1 | TA | 6 | Typ | 1 | TA | Attchd | 1976.0 | RFn | 2 | 460 | TA | TA | Y | 298 | 0 | 0 | 0 | 0 | 0 | NP | NF | NE | 0 | 5 | 2007 | WD | Normal | 181500 |
| 3 | 60 | RL | 68.0 | 11250 | Pave | NAA | IR1 | Lvl | AllPub | Inside | Gtl | CollgCr | Norm | Norm | 1Fam | 2Story | 7 | 5 | 2001 | 2002 | Gable | CompShg | VinylSd | VinylSd | BrkFace | 162.0 | Gd | TA | PConc | Gd | TA | Mn | GLQ | 486 | Unf | 0 | 434 | 920 | GasA | Ex | Y | SBrkr | 920 | 866 | 0 | 1786 | 1 | 0 | 2 | 1 | 3 | 1 | Gd | 6 | Typ | 1 | TA | Attchd | 2001.0 | RFn | 2 | 608 | TA | TA | Y | 0 | 42 | 0 | 0 | 0 | 0 | NP | NF | NE | 0 | 9 | 2008 | WD | Normal | 223500 |
| 4 | 70 | RL | 60.0 | 9550 | Pave | NAA | IR1 | Lvl | AllPub | Corner | Gtl | Crawfor | Norm | Norm | 1Fam | 2Story | 7 | 5 | 1915 | 1970 | Gable | CompShg | Wd Sdng | Wd Shng | None | 0.0 | TA | TA | BrkTil | TA | Gd | No | ALQ | 216 | Unf | 0 | 540 | 756 | GasA | Gd | Y | SBrkr | 961 | 756 | 0 | 1717 | 1 | 0 | 1 | 0 | 3 | 1 | Gd | 7 | Typ | 1 | Gd | Detchd | 1998.0 | Unf | 3 | 642 | TA | TA | Y | 0 | 35 | 272 | 0 | 0 | 0 | NP | NF | NE | 0 | 2 | 2006 | WD | Abnorml | 140000 |
| 5 | 60 | RL | 84.0 | 14260 | Pave | NAA | IR1 | Lvl | AllPub | FR2 | Gtl | NoRidge | Norm | Norm | 1Fam | 2Story | 8 | 5 | 2000 | 2000 | Gable | CompShg | VinylSd | VinylSd | BrkFace | 350.0 | Gd | TA | PConc | Gd | TA | Av | GLQ | 655 | Unf | 0 | 490 | 1145 | GasA | Ex | Y | SBrkr | 1145 | 1053 | 0 | 2198 | 1 | 0 | 2 | 1 | 4 | 1 | Gd | 9 | Typ | 1 | TA | Attchd | 2000.0 | RFn | 3 | 836 | TA | TA | Y | 192 | 84 | 0 | 0 | 0 | 0 | NP | NF | NE | 0 | 12 | 2008 | WD | Normal | 250000 |
## Set index for test data and display first 5 records.
test_data = test_data.set_index('Id')
test_data.head()
| MSSubClass | MSZoning | LotFrontage | LotArea | Street | Alley | LotShape | LandContour | Utilities | LotConfig | LandSlope | Neighborhood | Condition1 | Condition2 | BldgType | HouseStyle | OverallQual | OverallCond | YearBuilt | YearRemodAdd | RoofStyle | RoofMatl | Exterior1st | Exterior2nd | MasVnrType | MasVnrArea | ExterQual | ExterCond | Foundation | BsmtQual | BsmtCond | BsmtExposure | BsmtFinType1 | BsmtFinSF1 | BsmtFinType2 | BsmtFinSF2 | BsmtUnfSF | TotalBsmtSF | Heating | HeatingQC | CentralAir | Electrical | 1stFlrSF | 2ndFlrSF | LowQualFinSF | GrLivArea | BsmtFullBath | BsmtHalfBath | FullBath | HalfBath | BedroomAbvGr | KitchenAbvGr | KitchenQual | TotRmsAbvGrd | Functional | Fireplaces | FireplaceQu | GarageType | GarageYrBlt | GarageFinish | GarageCars | GarageArea | GarageQual | GarageCond | PavedDrive | WoodDeckSF | OpenPorchSF | EnclosedPorch | 3SsnPorch | ScreenPorch | PoolArea | PoolQC | Fence | MiscFeature | MiscVal | MoSold | YrSold | SaleType | SaleCondition | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Id | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| 1461 | 20 | RH | 80.0 | 11622 | Pave | NAA | Reg | Lvl | AllPub | Inside | Gtl | NAmes | Feedr | Norm | 1Fam | 1Story | 5 | 6 | 1961 | 1961 | Gable | CompShg | VinylSd | VinylSd | None | 0.0 | TA | TA | CBlock | TA | TA | No | Rec | 468.0 | LwQ | 144.0 | 270.0 | 882.0 | GasA | TA | Y | SBrkr | 896 | 0 | 0 | 896 | 0.0 | 0.0 | 1 | 0 | 2 | 1 | TA | 5 | Typ | 0 | NF | Attchd | 1961.0 | Unf | 1.0 | 730.0 | TA | TA | Y | 140 | 0 | 0 | 0 | 120 | 0 | NP | MnPrv | NE | 0 | 6 | 2010 | WD | Normal |
| 1462 | 20 | RL | 81.0 | 14267 | Pave | NAA | IR1 | Lvl | AllPub | Corner | Gtl | NAmes | Norm | Norm | 1Fam | 1Story | 6 | 6 | 1958 | 1958 | Hip | CompShg | Wd Sdng | Wd Sdng | BrkFace | 108.0 | TA | TA | CBlock | TA | TA | No | ALQ | 923.0 | Unf | 0.0 | 406.0 | 1329.0 | GasA | TA | Y | SBrkr | 1329 | 0 | 0 | 1329 | 0.0 | 0.0 | 1 | 1 | 3 | 1 | Gd | 6 | Typ | 0 | NF | Attchd | 1958.0 | Unf | 1.0 | 312.0 | TA | TA | Y | 393 | 36 | 0 | 0 | 0 | 0 | NP | NF | Gar2 | 12500 | 6 | 2010 | WD | Normal |
| 1463 | 60 | RL | 74.0 | 13830 | Pave | NAA | IR1 | Lvl | AllPub | Inside | Gtl | Gilbert | Norm | Norm | 1Fam | 2Story | 5 | 5 | 1997 | 1998 | Gable | CompShg | VinylSd | VinylSd | None | 0.0 | TA | TA | PConc | Gd | TA | No | GLQ | 791.0 | Unf | 0.0 | 137.0 | 928.0 | GasA | Gd | Y | SBrkr | 928 | 701 | 0 | 1629 | 0.0 | 0.0 | 2 | 1 | 3 | 1 | TA | 6 | Typ | 1 | TA | Attchd | 1997.0 | Fin | 2.0 | 482.0 | TA | TA | Y | 212 | 34 | 0 | 0 | 0 | 0 | NP | MnPrv | NE | 0 | 3 | 2010 | WD | Normal |
| 1464 | 60 | RL | 78.0 | 9978 | Pave | NAA | IR1 | Lvl | AllPub | Inside | Gtl | Gilbert | Norm | Norm | 1Fam | 2Story | 6 | 6 | 1998 | 1998 | Gable | CompShg | VinylSd | VinylSd | BrkFace | 20.0 | TA | TA | PConc | TA | TA | No | GLQ | 602.0 | Unf | 0.0 | 324.0 | 926.0 | GasA | Ex | Y | SBrkr | 926 | 678 | 0 | 1604 | 0.0 | 0.0 | 2 | 1 | 3 | 1 | Gd | 7 | Typ | 1 | Gd | Attchd | 1998.0 | Fin | 2.0 | 470.0 | TA | TA | Y | 360 | 36 | 0 | 0 | 0 | 0 | NP | NF | NE | 0 | 6 | 2010 | WD | Normal |
| 1465 | 120 | RL | 43.0 | 5005 | Pave | NAA | IR1 | HLS | AllPub | Inside | Gtl | StoneBr | Norm | Norm | TwnhsE | 1Story | 8 | 5 | 1992 | 1992 | Gable | CompShg | HdBoard | HdBoard | None | 0.0 | Gd | TA | PConc | Gd | TA | No | ALQ | 263.0 | Unf | 0.0 | 1017.0 | 1280.0 | GasA | Ex | Y | SBrkr | 1280 | 0 | 0 | 1280 | 0.0 | 0.0 | 2 | 0 | 2 | 1 | Gd | 5 | Typ | 0 | NF | Attchd | 1992.0 | RFn | 2.0 | 506.0 | TA | TA | Y | 0 | 82 | 0 | 0 | 144 | 0 | NP | NF | NE | 0 | 1 | 2010 | WD | Normal |
## Seperate numeric and categorical columns for train data.
cat_columns = data.select_dtypes(include=['category'])
num_columns = data.select_dtypes(include=['int64', 'float64'])
## Seperate numeric and categorical columns for test data.
test_cat_columns = test_data.select_dtypes(include=['category'])
test_num_columns = test_data.select_dtypes(include=['int64', 'float64'])
## Get unique values for BsmtFullBath column of train data.
data.BsmtFullBath.nunique()
4
## Below logic is used for checking special charcter in numeric columns(Train data).
for col in num_columns.columns:
print('\n',col,'----->')
for index in range(1,len(data)):
try:
skip=float(data.loc[index,col])
skip=int(data.loc[index,col])
except ValueError :
print(index,data.loc[index,col])
LotFrontage -----> 8 nan 13 nan 15 nan 17 nan 25 nan 32 nan 43 nan 44 nan 51 nan 65 nan 67 nan 77 nan 85 nan 96 nan 101 nan 105 nan 112 nan 114 nan 117 nan 121 nan 127 nan 132 nan 134 nan 137 nan 148 nan 150 nan 153 nan 154 nan 161 nan 167 nan 170 nan 171 nan 178 nan 181 nan 187 nan 192 nan 204 nan 208 nan 209 nan 215 nan 219 nan 222 nan 235 nan 238 nan 245 nan 250 nan 270 nan 288 nan 289 nan 294 nan 308 nan 309 nan 311 nan 320 nan 329 nan 331 nan 336 nan 343 nan 347 nan 348 nan 352 nan 357 nan 361 nan 362 nan 365 nan 367 nan 370 nan 371 nan 376 nan 385 nan 393 nan 394 nan 405 nan 406 nan 413 nan 422 nan 427 nan 448 nan 453 nan 458 nan 459 nan 460 nan 466 nan 471 nan 485 nan 491 nan 497 nan 517 nan 519 nan 530 nan 538 nan 539 nan 540 nan 542 nan 546 nan 560 nan 561 nan 565 nan 570 nan 581 nan 594 nan 611 nan 612 nan 613 nan 617 nan 624 nan 627 nan 642 nan 646 nan 661 nan 667 nan 669 nan 673 nan 680 nan 683 nan 686 nan 688 nan 691 nan 707 nan 710 nan 715 nan 721 nan 722 nan 727 nan 735 nan 746 nan 747 nan 752 nan 758 nan 771 nan 784 nan 786 nan 790 nan 792 nan 795 nan 812 nan 817 nan 818 nan 823 nan 829 nan 841 nan 846 nan 852 nan 854 nan 856 nan 857 nan 860 nan 866 nan 869 nan 880 nan 883 nan 894 nan 901 nan 905 nan 909 nan 912 nan 918 nan 926 nan 928 nan 929 nan 930 nan 940 nan 942 nan 945 nan 954 nan 962 nan 968 nan 976 nan 981 nan 984 nan 989 nan 997 nan 998 nan 1004 nan 1007 nan 1018 nan 1019 nan 1025 nan 1031 nan 1033 nan 1034 nan 1036 nan 1038 nan 1042 nan 1046 nan 1058 nan 1060 nan 1065 nan 1078 nan 1085 nan 1087 nan 1098 nan 1109 nan 1111 nan 1117 nan 1123 nan 1125 nan 1139 nan 1142 nan 1144 nan 1147 nan 1149 nan 1154 nan 1155 nan 1162 nan 1165 nan 1178 nan 1181 nan 1191 nan 1194 nan 1207 nan 1214 nan 1231 nan 1234 nan 1245 nan 1248 nan 1252 nan 1254 nan 1261 nan 1263 nan 1269 nan 1271 nan 1272 nan 1273 nan 1277 nan 1278 nan 1287 nan 1288 nan 1291 nan 1301 nan 1302 nan 1310 nan 1313 nan 1319 nan 1322 nan 1343 nan 1347 nan 1349 nan 1355 nan 1357 nan 1358 nan 1359 nan 1363 nan 1366 nan 1369 nan 1374 nan 1382 nan 1384 nan 1397 nan 1408 nan 1418 nan 1420 nan 1424 nan 1425 nan 1430 nan 1432 nan 1442 nan 1444 nan 1447 nan LotArea -----> YearBuilt -----> YearRemodAdd -----> MasVnrArea -----> 235 nan 530 nan 651 nan 937 nan 974 nan 978 nan 1244 nan 1279 nan BsmtFinSF1 -----> BsmtFinSF2 -----> BsmtUnfSF -----> TotalBsmtSF -----> 1stFlrSF -----> 2ndFlrSF -----> LowQualFinSF -----> GrLivArea -----> BsmtFullBath -----> BsmtHalfBath -----> FullBath -----> HalfBath -----> BedroomAbvGr -----> KitchenAbvGr -----> TotRmsAbvGrd -----> Fireplaces -----> GarageYrBlt -----> 40 nan 49 nan 79 nan 89 nan 90 nan 100 nan 109 nan 126 nan 128 nan 141 nan 149 nan 156 nan 164 nan 166 nan 199 nan 211 nan 242 nan 251 nan 288 nan 292 nan 308 nan 376 nan 387 nan 394 nan 432 nan 435 nan 442 nan 465 nan 496 nan 521 nan 529 nan 534 nan 536 nan 563 nan 583 nan 614 nan 615 nan 621 nan 636 nan 637 nan 639 nan 650 nan 706 nan 711 nan 739 nan 751 nan 785 nan 827 nan 844 nan 922 nan 943 nan 955 nan 961 nan 969 nan 971 nan 977 nan 1010 nan 1012 nan 1031 nan 1039 nan 1097 nan 1124 nan 1132 nan 1138 nan 1144 nan 1174 nan 1180 nan 1219 nan 1220 nan 1235 nan 1258 nan 1284 nan 1324 nan 1326 nan 1327 nan 1338 nan 1350 nan 1408 nan 1450 nan 1451 nan 1454 nan GarageCars -----> GarageArea -----> WoodDeckSF -----> OpenPorchSF -----> EnclosedPorch -----> 3SsnPorch -----> ScreenPorch -----> PoolArea -----> MiscVal -----> MoSold -----> YrSold -----> SalePrice ----->
## Display train data numeric columns.
num_columns.columns
Index(['LotFrontage', 'LotArea', 'YearBuilt', 'YearRemodAdd', 'MasVnrArea',
'BsmtFinSF1', 'BsmtFinSF2', 'BsmtUnfSF', 'TotalBsmtSF', '1stFlrSF',
'2ndFlrSF', 'LowQualFinSF', 'GrLivArea', 'BsmtFullBath', 'BsmtHalfBath',
'FullBath', 'HalfBath', 'BedroomAbvGr', 'KitchenAbvGr', 'TotRmsAbvGrd',
'Fireplaces', 'GarageYrBlt', 'GarageCars', 'GarageArea', 'WoodDeckSF',
'OpenPorchSF', 'EnclosedPorch', '3SsnPorch', 'ScreenPorch', 'PoolArea',
'MiscVal', 'MoSold', 'YrSold', 'SalePrice'],
dtype='object')
## Display test data numeric columns.
test_num_columns.columns
Index(['LotFrontage', 'LotArea', 'YearBuilt', 'YearRemodAdd', 'MasVnrArea',
'BsmtFinSF1', 'BsmtFinSF2', 'BsmtUnfSF', 'TotalBsmtSF', '1stFlrSF',
'2ndFlrSF', 'LowQualFinSF', 'GrLivArea', 'BsmtFullBath', 'BsmtHalfBath',
'FullBath', 'HalfBath', 'BedroomAbvGr', 'KitchenAbvGr', 'TotRmsAbvGrd',
'Fireplaces', 'GarageYrBlt', 'GarageCars', 'GarageArea', 'WoodDeckSF',
'OpenPorchSF', 'EnclosedPorch', '3SsnPorch', 'ScreenPorch', 'PoolArea',
'MiscVal', 'MoSold', 'YrSold'],
dtype='object')
## Check corrlation between numeric columns of train data.
data[num_columns.columns].corr()
| LotFrontage | LotArea | YearBuilt | YearRemodAdd | MasVnrArea | BsmtFinSF1 | BsmtFinSF2 | BsmtUnfSF | TotalBsmtSF | 1stFlrSF | 2ndFlrSF | LowQualFinSF | GrLivArea | BsmtFullBath | BsmtHalfBath | FullBath | HalfBath | BedroomAbvGr | KitchenAbvGr | TotRmsAbvGrd | Fireplaces | GarageYrBlt | GarageCars | GarageArea | WoodDeckSF | OpenPorchSF | EnclosedPorch | 3SsnPorch | ScreenPorch | PoolArea | MiscVal | MoSold | YrSold | SalePrice | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| LotFrontage | 1.000000 | 0.426095 | 0.123349 | 0.088866 | 0.193458 | 0.233633 | 0.049900 | 0.132644 | 0.392075 | 0.457181 | 0.080177 | 0.038469 | 0.402797 | 0.100949 | -0.007234 | 0.198769 | 0.053532 | 0.263170 | -0.006069 | 0.352096 | 0.266639 | 0.070250 | 0.285691 | 0.344997 | 0.088521 | 0.151972 | 0.010700 | 0.070029 | 0.041383 | 0.206167 | 0.003368 | 0.011200 | 0.007450 | 0.351799 |
| LotArea | 0.426095 | 1.000000 | 0.014228 | 0.013788 | 0.104160 | 0.214103 | 0.111170 | -0.002618 | 0.260833 | 0.299475 | 0.050986 | 0.004779 | 0.263116 | 0.158155 | 0.048046 | 0.126031 | 0.014259 | 0.119690 | -0.017784 | 0.190015 | 0.271364 | -0.024947 | 0.154871 | 0.180403 | 0.171698 | 0.084774 | -0.018340 | 0.020423 | 0.043160 | 0.077672 | 0.038068 | 0.001205 | -0.014261 | 0.263843 |
| YearBuilt | 0.123349 | 0.014228 | 1.000000 | 0.592855 | 0.315707 | 0.249503 | -0.049107 | 0.149040 | 0.391452 | 0.281986 | 0.010308 | -0.183784 | 0.199010 | 0.187599 | -0.038162 | 0.468271 | 0.242656 | -0.070651 | -0.174800 | 0.095589 | 0.147716 | 0.825667 | 0.537850 | 0.478954 | 0.224880 | 0.188686 | -0.387268 | 0.031355 | -0.050364 | 0.004950 | -0.034383 | 0.012398 | -0.013618 | 0.522897 |
| YearRemodAdd | 0.088866 | 0.013788 | 0.592855 | 1.000000 | 0.179618 | 0.128451 | -0.067759 | 0.181133 | 0.291066 | 0.240379 | 0.140024 | -0.062419 | 0.287389 | 0.119470 | -0.012337 | 0.439046 | 0.183331 | -0.040581 | -0.149598 | 0.191740 | 0.112581 | 0.642277 | 0.420622 | 0.371600 | 0.205726 | 0.226298 | -0.193919 | 0.045286 | -0.038740 | 0.005829 | -0.010286 | 0.021490 | 0.035743 | 0.507101 |
| MasVnrArea | 0.193458 | 0.104160 | 0.315707 | 0.179618 | 1.000000 | 0.264736 | -0.072319 | 0.114442 | 0.363936 | 0.344501 | 0.174561 | -0.069071 | 0.390857 | 0.085310 | 0.026673 | 0.276833 | 0.201444 | 0.102821 | -0.037610 | 0.280682 | 0.249070 | 0.252691 | 0.364204 | 0.373066 | 0.159718 | 0.125703 | -0.110204 | 0.018796 | 0.061466 | 0.011723 | -0.029815 | -0.005965 | -0.008201 | 0.477493 |
| BsmtFinSF1 | 0.233633 | 0.214103 | 0.249503 | 0.128451 | 0.264736 | 1.000000 | -0.050117 | -0.495251 | 0.522396 | 0.445863 | -0.137079 | -0.064503 | 0.208171 | 0.649212 | 0.067418 | 0.058543 | 0.004262 | -0.107355 | -0.081007 | 0.044316 | 0.260011 | 0.153484 | 0.224054 | 0.296970 | 0.204306 | 0.111761 | -0.102303 | 0.026451 | 0.062021 | 0.140491 | 0.003571 | -0.015727 | 0.014359 | 0.386420 |
| BsmtFinSF2 | 0.049900 | 0.111170 | -0.049107 | -0.067759 | -0.072319 | -0.050117 | 1.000000 | -0.209294 | 0.104810 | 0.097117 | -0.099260 | 0.014807 | -0.009640 | 0.158678 | 0.070948 | -0.076444 | -0.032148 | -0.015728 | -0.040751 | -0.035227 | 0.046921 | -0.088011 | -0.038264 | -0.018227 | 0.067898 | 0.003093 | 0.036543 | -0.029993 | 0.088871 | 0.041709 | 0.004940 | -0.015211 | 0.031706 | -0.011378 |
| BsmtUnfSF | 0.132644 | -0.002618 | 0.149040 | 0.181133 | 0.114442 | -0.495251 | -0.209294 | 1.000000 | 0.415360 | 0.317987 | 0.004469 | 0.028167 | 0.240257 | -0.422900 | -0.095804 | 0.288886 | -0.041118 | 0.166643 | 0.030086 | 0.250647 | 0.051575 | 0.190708 | 0.214175 | 0.183303 | -0.005316 | 0.129005 | -0.002538 | 0.020764 | -0.012579 | -0.035092 | -0.023837 | 0.034888 | -0.041258 | 0.214479 |
| TotalBsmtSF | 0.392075 | 0.260833 | 0.391452 | 0.291066 | 0.363936 | 0.522396 | 0.104810 | 0.415360 | 1.000000 | 0.819530 | -0.174512 | -0.033245 | 0.454868 | 0.307351 | -0.000315 | 0.323722 | -0.048804 | 0.050450 | -0.068901 | 0.285573 | 0.339519 | 0.322445 | 0.434585 | 0.486665 | 0.232019 | 0.247264 | -0.095478 | 0.037384 | 0.084489 | 0.126053 | -0.018479 | 0.013196 | -0.014969 | 0.613581 |
| 1stFlrSF | 0.457181 | 0.299475 | 0.281986 | 0.240379 | 0.344501 | 0.445863 | 0.097117 | 0.317987 | 0.819530 | 1.000000 | -0.202646 | -0.014241 | 0.566024 | 0.244671 | 0.001956 | 0.380637 | -0.119916 | 0.127401 | 0.068101 | 0.409516 | 0.410531 | 0.233449 | 0.439317 | 0.489782 | 0.235459 | 0.211671 | -0.065292 | 0.056104 | 0.088758 | 0.131525 | -0.021096 | 0.031372 | -0.013604 | 0.605852 |
| 2ndFlrSF | 0.080177 | 0.050986 | 0.010308 | 0.140024 | 0.174561 | -0.137079 | -0.099260 | 0.004469 | -0.174512 | -0.202646 | 1.000000 | 0.063353 | 0.687501 | -0.169494 | -0.023855 | 0.421378 | 0.609707 | 0.502901 | 0.059306 | 0.616423 | 0.194561 | 0.070832 | 0.183926 | 0.138347 | 0.092165 | 0.208026 | 0.061989 | -0.024358 | 0.040606 | 0.081487 | 0.016197 | 0.035164 | -0.028700 | 0.319334 |
| LowQualFinSF | 0.038469 | 0.004779 | -0.183784 | -0.062419 | -0.069071 | -0.064503 | 0.014807 | 0.028167 | -0.033245 | -0.014241 | 0.063353 | 1.000000 | 0.134683 | -0.047143 | -0.005842 | -0.000710 | -0.027080 | 0.105607 | 0.007522 | 0.131185 | -0.021272 | -0.036363 | -0.094480 | -0.067601 | -0.025444 | 0.018251 | 0.061081 | -0.004296 | 0.026799 | 0.062157 | -0.003793 | -0.022174 | -0.028921 | -0.025606 |
| GrLivArea | 0.402797 | 0.263116 | 0.199010 | 0.287389 | 0.390857 | 0.208171 | -0.009640 | 0.240257 | 0.454868 | 0.566024 | 0.687501 | 0.134683 | 1.000000 | 0.034836 | -0.018918 | 0.630012 | 0.415772 | 0.521270 | 0.100063 | 0.825489 | 0.461679 | 0.231197 | 0.467247 | 0.468997 | 0.247433 | 0.330224 | 0.009113 | 0.020643 | 0.101510 | 0.170205 | -0.002416 | 0.050240 | -0.036526 | 0.708624 |
| BsmtFullBath | 0.100949 | 0.158155 | 0.187599 | 0.119470 | 0.085310 | 0.649212 | 0.158678 | -0.422900 | 0.307351 | 0.244671 | -0.169494 | -0.047143 | 0.034836 | 1.000000 | -0.147871 | -0.064512 | -0.030905 | -0.150673 | -0.041503 | -0.053275 | 0.137928 | 0.124553 | 0.131881 | 0.179189 | 0.175315 | 0.067341 | -0.049911 | -0.000106 | 0.023148 | 0.067616 | -0.023047 | -0.025361 | 0.067049 | 0.227122 |
| BsmtHalfBath | -0.007234 | 0.048046 | -0.038162 | -0.012337 | 0.026673 | 0.067418 | 0.070948 | -0.095804 | -0.000315 | 0.001956 | -0.023855 | -0.005842 | -0.018918 | -0.147871 | 1.000000 | -0.054536 | -0.012340 | 0.046519 | -0.037944 | -0.023836 | 0.028976 | -0.077464 | -0.020891 | -0.024536 | 0.040161 | -0.025324 | -0.008555 | 0.035114 | 0.032121 | 0.020025 | -0.007367 | 0.032873 | -0.046524 | -0.016844 |
| FullBath | 0.198769 | 0.126031 | 0.468271 | 0.439046 | 0.276833 | 0.058543 | -0.076444 | 0.288886 | 0.323722 | 0.380637 | 0.421378 | -0.000710 | 0.630012 | -0.064512 | -0.054536 | 1.000000 | 0.136381 | 0.363252 | 0.133115 | 0.554784 | 0.243671 | 0.484557 | 0.469672 | 0.405656 | 0.187703 | 0.259977 | -0.115093 | 0.035353 | -0.008106 | 0.049604 | -0.014290 | 0.055872 | -0.019669 | 0.560664 |
| HalfBath | 0.053532 | 0.014259 | 0.242656 | 0.183331 | 0.201444 | 0.004262 | -0.032148 | -0.041118 | -0.048804 | -0.119916 | 0.609707 | -0.027080 | 0.415772 | -0.030905 | -0.012340 | 0.136381 | 1.000000 | 0.226651 | -0.068263 | 0.343415 | 0.203649 | 0.196785 | 0.219178 | 0.163549 | 0.108080 | 0.199740 | -0.095317 | -0.004972 | 0.072426 | 0.022381 | 0.001290 | -0.009050 | -0.010269 | 0.284108 |
| BedroomAbvGr | 0.263170 | 0.119690 | -0.070651 | -0.040581 | 0.102821 | -0.107355 | -0.015728 | 0.166643 | 0.050450 | 0.127401 | 0.502901 | 0.105607 | 0.521270 | -0.150673 | 0.046519 | 0.363252 | 0.226651 | 1.000000 | 0.198597 | 0.676620 | 0.107570 | -0.064518 | 0.086106 | 0.065253 | 0.046854 | 0.093810 | 0.041570 | -0.024478 | 0.044300 | 0.070703 | 0.007767 | 0.046544 | -0.036014 | 0.168213 |
| KitchenAbvGr | -0.006069 | -0.017784 | -0.174800 | -0.149598 | -0.037610 | -0.081007 | -0.040751 | 0.030086 | -0.068901 | 0.068101 | 0.059306 | 0.007522 | 0.100063 | -0.041503 | -0.037944 | 0.133115 | -0.068263 | 0.198597 | 1.000000 | 0.256045 | -0.123936 | -0.124411 | -0.050634 | -0.064433 | -0.090130 | -0.070091 | 0.037312 | -0.024600 | -0.051613 | -0.014525 | 0.062341 | 0.026589 | 0.031687 | -0.135907 |
| TotRmsAbvGrd | 0.352096 | 0.190015 | 0.095589 | 0.191740 | 0.280682 | 0.044316 | -0.035227 | 0.250647 | 0.285573 | 0.409516 | 0.616423 | 0.131185 | 0.825489 | -0.053275 | -0.023836 | 0.554784 | 0.343415 | 0.676620 | 0.256045 | 1.000000 | 0.326114 | 0.148112 | 0.362289 | 0.337822 | 0.165984 | 0.234192 | 0.004151 | -0.006683 | 0.059383 | 0.083757 | 0.024763 | 0.036907 | -0.034516 | 0.533723 |
| Fireplaces | 0.266639 | 0.271364 | 0.147716 | 0.112581 | 0.249070 | 0.260011 | 0.046921 | 0.051575 | 0.339519 | 0.410531 | 0.194561 | -0.021272 | 0.461679 | 0.137928 | 0.028976 | 0.243671 | 0.203649 | 0.107570 | -0.123936 | 0.326114 | 1.000000 | 0.046822 | 0.300789 | 0.269141 | 0.200019 | 0.169405 | -0.024822 | 0.011257 | 0.184530 | 0.095074 | 0.001409 | 0.046357 | -0.024096 | 0.466929 |
| GarageYrBlt | 0.070250 | -0.024947 | 0.825667 | 0.642277 | 0.252691 | 0.153484 | -0.088011 | 0.190708 | 0.322445 | 0.233449 | 0.070832 | -0.036363 | 0.231197 | 0.124553 | -0.077464 | 0.484557 | 0.196785 | -0.064518 | -0.124411 | 0.148112 | 0.046822 | 1.000000 | 0.588920 | 0.564567 | 0.224577 | 0.228425 | -0.297003 | 0.023544 | -0.075418 | -0.014501 | -0.032417 | 0.005337 | -0.001014 | 0.486362 |
| GarageCars | 0.285691 | 0.154871 | 0.537850 | 0.420622 | 0.364204 | 0.224054 | -0.038264 | 0.214175 | 0.434585 | 0.439317 | 0.183926 | -0.094480 | 0.467247 | 0.131881 | -0.020891 | 0.469672 | 0.219178 | 0.086106 | -0.050634 | 0.362289 | 0.300789 | 0.588920 | 1.000000 | 0.882475 | 0.226342 | 0.213569 | -0.151434 | 0.035765 | 0.050494 | 0.020934 | -0.043080 | 0.040522 | -0.039117 | 0.640409 |
| GarageArea | 0.344997 | 0.180403 | 0.478954 | 0.371600 | 0.373066 | 0.296970 | -0.018227 | 0.183303 | 0.486665 | 0.489782 | 0.138347 | -0.067601 | 0.468997 | 0.179189 | -0.024536 | 0.405656 | 0.163549 | 0.065253 | -0.064433 | 0.337822 | 0.269141 | 0.564567 | 0.882475 | 1.000000 | 0.224666 | 0.241435 | -0.121777 | 0.035087 | 0.051412 | 0.061047 | -0.027400 | 0.027974 | -0.027378 | 0.623431 |
| WoodDeckSF | 0.088521 | 0.171698 | 0.224880 | 0.205726 | 0.159718 | 0.204306 | 0.067898 | -0.005316 | 0.232019 | 0.235459 | 0.092165 | -0.025444 | 0.247433 | 0.175315 | 0.040161 | 0.187703 | 0.108080 | 0.046854 | -0.090130 | 0.165984 | 0.200019 | 0.224577 | 0.226342 | 0.224666 | 1.000000 | 0.058661 | -0.125989 | -0.032771 | -0.074181 | 0.073378 | -0.009551 | 0.021011 | 0.022270 | 0.324413 |
| OpenPorchSF | 0.151972 | 0.084774 | 0.188686 | 0.226298 | 0.125703 | 0.111761 | 0.003093 | 0.129005 | 0.247264 | 0.211671 | 0.208026 | 0.018251 | 0.330224 | 0.067341 | -0.025324 | 0.259977 | 0.199740 | 0.093810 | -0.070091 | 0.234192 | 0.169405 | 0.228425 | 0.213569 | 0.241435 | 0.058661 | 1.000000 | -0.093079 | -0.005842 | 0.074304 | 0.060762 | -0.018584 | 0.071255 | -0.057619 | 0.315856 |
| EnclosedPorch | 0.010700 | -0.018340 | -0.387268 | -0.193919 | -0.110204 | -0.102303 | 0.036543 | -0.002538 | -0.095478 | -0.065292 | 0.061989 | 0.061081 | 0.009113 | -0.049911 | -0.008555 | -0.115093 | -0.095317 | 0.041570 | 0.037312 | 0.004151 | -0.024822 | -0.297003 | -0.151434 | -0.121777 | -0.125989 | -0.093079 | 1.000000 | -0.037305 | -0.082864 | 0.054203 | 0.018361 | -0.028887 | -0.009916 | -0.128578 |
| 3SsnPorch | 0.070029 | 0.020423 | 0.031355 | 0.045286 | 0.018796 | 0.026451 | -0.029993 | 0.020764 | 0.037384 | 0.056104 | -0.024358 | -0.004296 | 0.020643 | -0.000106 | 0.035114 | 0.035353 | -0.004972 | -0.024478 | -0.024600 | -0.006683 | 0.011257 | 0.023544 | 0.035765 | 0.035087 | -0.032771 | -0.005842 | -0.037305 | 1.000000 | -0.031436 | -0.007992 | 0.000354 | 0.029474 | 0.018645 | 0.044584 |
| ScreenPorch | 0.041383 | 0.043160 | -0.050364 | -0.038740 | 0.061466 | 0.062021 | 0.088871 | -0.012579 | 0.084489 | 0.088758 | 0.040606 | 0.026799 | 0.101510 | 0.023148 | 0.032121 | -0.008106 | 0.072426 | 0.044300 | -0.051613 | 0.059383 | 0.184530 | -0.075418 | 0.050494 | 0.051412 | -0.074181 | 0.074304 | -0.082864 | -0.031436 | 1.000000 | 0.051307 | 0.031946 | 0.023217 | 0.010694 | 0.111447 |
| PoolArea | 0.206167 | 0.077672 | 0.004950 | 0.005829 | 0.011723 | 0.140491 | 0.041709 | -0.035092 | 0.126053 | 0.131525 | 0.081487 | 0.062157 | 0.170205 | 0.067616 | 0.020025 | 0.049604 | 0.022381 | 0.070703 | -0.014525 | 0.083757 | 0.095074 | -0.014501 | 0.020934 | 0.061047 | 0.073378 | 0.060762 | 0.054203 | -0.007992 | 0.051307 | 1.000000 | 0.029669 | -0.033737 | -0.059689 | 0.092404 |
| MiscVal | 0.003368 | 0.038068 | -0.034383 | -0.010286 | -0.029815 | 0.003571 | 0.004940 | -0.023837 | -0.018479 | -0.021096 | 0.016197 | -0.003793 | -0.002416 | -0.023047 | -0.007367 | -0.014290 | 0.001290 | 0.007767 | 0.062341 | 0.024763 | 0.001409 | -0.032417 | -0.043080 | -0.027400 | -0.009551 | -0.018584 | 0.018361 | 0.000354 | 0.031946 | 0.029669 | 1.000000 | -0.006495 | 0.004906 | -0.021190 |
| MoSold | 0.011200 | 0.001205 | 0.012398 | 0.021490 | -0.005965 | -0.015727 | -0.015211 | 0.034888 | 0.013196 | 0.031372 | 0.035164 | -0.022174 | 0.050240 | -0.025361 | 0.032873 | 0.055872 | -0.009050 | 0.046544 | 0.026589 | 0.036907 | 0.046357 | 0.005337 | 0.040522 | 0.027974 | 0.021011 | 0.071255 | -0.028887 | 0.029474 | 0.023217 | -0.033737 | -0.006495 | 1.000000 | -0.145721 | 0.046432 |
| YrSold | 0.007450 | -0.014261 | -0.013618 | 0.035743 | -0.008201 | 0.014359 | 0.031706 | -0.041258 | -0.014969 | -0.013604 | -0.028700 | -0.028921 | -0.036526 | 0.067049 | -0.046524 | -0.019669 | -0.010269 | -0.036014 | 0.031687 | -0.034516 | -0.024096 | -0.001014 | -0.039117 | -0.027378 | 0.022270 | -0.057619 | -0.009916 | 0.018645 | 0.010694 | -0.059689 | 0.004906 | -0.145721 | 1.000000 | -0.028923 |
| SalePrice | 0.351799 | 0.263843 | 0.522897 | 0.507101 | 0.477493 | 0.386420 | -0.011378 | 0.214479 | 0.613581 | 0.605852 | 0.319334 | -0.025606 | 0.708624 | 0.227122 | -0.016844 | 0.560664 | 0.284108 | 0.168213 | -0.135907 | 0.533723 | 0.466929 | 0.486362 | 0.640409 | 0.623431 | 0.324413 | 0.315856 | -0.128578 | 0.044584 | 0.111447 | 0.092404 | -0.021190 | 0.046432 | -0.028923 | 1.000000 |
## Check corrlation between numeric columns of test data.
test_data[test_num_columns.columns].corr()
| LotFrontage | LotArea | YearBuilt | YearRemodAdd | MasVnrArea | BsmtFinSF1 | BsmtFinSF2 | BsmtUnfSF | TotalBsmtSF | 1stFlrSF | 2ndFlrSF | LowQualFinSF | GrLivArea | BsmtFullBath | BsmtHalfBath | FullBath | HalfBath | BedroomAbvGr | KitchenAbvGr | TotRmsAbvGrd | Fireplaces | GarageYrBlt | GarageCars | GarageArea | WoodDeckSF | OpenPorchSF | EnclosedPorch | 3SsnPorch | ScreenPorch | PoolArea | MiscVal | MoSold | YrSold | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| LotFrontage | 1.000000 | 0.644608 | 0.122356 | 0.092603 | 0.251533 | 0.204621 | 0.046824 | 0.092031 | 0.315802 | 0.461239 | -0.036185 | -0.037294 | 0.357125 | 0.127314 | -0.042779 | 0.163078 | 0.023850 | 0.205100 | 0.016072 | 0.344366 | 0.257037 | 0.082069 | 0.336373 | 0.375581 | 0.157426 | 0.179795 | 0.013340 | -0.037487 | 0.113444 | 0.134232 | 0.068161 | 0.008810 | -0.025263 |
| LotArea | 0.644608 | 1.000000 | 0.048314 | 0.036907 | 0.188691 | 0.185470 | 0.054199 | 0.071681 | 0.283049 | 0.456417 | -0.007862 | -0.012457 | 0.366324 | 0.094052 | -0.008378 | 0.147871 | 0.079581 | 0.181171 | -0.031830 | 0.289576 | 0.282210 | 0.018330 | 0.263398 | 0.315841 | 0.158483 | 0.164815 | 0.099850 | -0.001846 | 0.088712 | 0.140494 | 0.139071 | 0.005152 | -0.051144 |
| YearBuilt | 0.122356 | 0.048314 | 1.000000 | 0.631696 | 0.312404 | 0.309595 | -0.008174 | 0.111892 | 0.425447 | 0.338733 | 0.025195 | -0.101154 | 0.290412 | 0.234922 | -0.022947 | 0.474028 | 0.296700 | -0.035923 | -0.098644 | 0.134839 | 0.193597 | 0.844150 | 0.538428 | 0.482497 | 0.233889 | 0.208040 | -0.363012 | -0.005442 | -0.031984 | -0.001060 | 0.007325 | 0.015599 | -0.011006 |
| YearRemodAdd | 0.092603 | 0.036907 | 0.631696 | 1.000000 | 0.213937 | 0.175219 | -0.056320 | 0.148773 | 0.304515 | 0.243793 | 0.177177 | -0.059973 | 0.347946 | 0.150371 | -0.076928 | 0.477064 | 0.238807 | -0.004413 | -0.135940 | 0.203619 | 0.153965 | 0.661765 | 0.431442 | 0.382034 | 0.230724 | 0.258049 | -0.243582 | 0.025823 | -0.053761 | -0.034862 | 0.003011 | 0.011771 | 0.029715 |
| MasVnrArea | 0.251533 | 0.188691 | 0.312404 | 0.213937 | 1.000000 | 0.343267 | 0.037546 | 0.064672 | 0.430966 | 0.446875 | 0.063659 | -0.045886 | 0.416648 | 0.198270 | 0.003992 | 0.242522 | 0.182094 | 0.053259 | -0.066331 | 0.275533 | 0.301575 | 0.257439 | 0.358488 | 0.375182 | 0.172721 | 0.163666 | -0.112814 | 0.005772 | 0.069339 | -0.005395 | 0.105723 | 0.005118 | -0.029556 |
| BsmtFinSF1 | 0.204621 | 0.185470 | 0.309595 | 0.175219 | 0.343267 | 1.000000 | -0.059522 | -0.459581 | 0.550444 | 0.470077 | -0.188952 | -0.068178 | 0.215692 | 0.628903 | 0.088971 | 0.104464 | -0.018966 | -0.119743 | -0.092190 | 0.060346 | 0.326075 | 0.232798 | 0.285959 | 0.323800 | 0.242369 | 0.136321 | -0.097441 | 0.088241 | 0.131414 | 0.012089 | 0.165403 | 0.013397 | 0.030779 |
| BsmtFinSF2 | 0.046824 | 0.054199 | -0.008174 | -0.056320 | 0.037546 | -0.059522 | 1.000000 | -0.265183 | 0.076092 | 0.073346 | -0.095935 | -0.023976 | -0.025137 | 0.166500 | 0.123715 | -0.074858 | -0.032627 | -0.044951 | -0.034790 | -0.060361 | 0.083655 | -0.051545 | 0.005806 | 0.022391 | 0.126032 | -0.014185 | 0.029010 | -0.014473 | 0.039806 | 0.050152 | -0.012808 | -0.003162 | -0.011749 |
| BsmtUnfSF | 0.092031 | 0.071681 | 0.111892 | 0.148773 | 0.064672 | -0.459581 | -0.265183 | 1.000000 | 0.409023 | 0.275569 | -0.006196 | 0.067212 | 0.226780 | -0.374659 | -0.117686 | 0.257721 | -0.030584 | 0.199661 | 0.102069 | 0.243566 | -0.043014 | 0.153246 | 0.147005 | 0.145625 | -0.073174 | 0.111249 | 0.012468 | -0.046230 | -0.085111 | -0.029672 | 0.000320 | 0.009132 | -0.035214 |
| TotalBsmtSF | 0.315802 | 0.283049 | 0.425447 | 0.304515 | 0.430966 | 0.550444 | 0.076092 | 0.409023 | 1.000000 | 0.784538 | -0.238633 | -0.013294 | 0.435576 | 0.343680 | 0.024722 | 0.331947 | -0.062711 | 0.056093 | -0.007879 | 0.278408 | 0.326100 | 0.372405 | 0.441401 | 0.485558 | 0.227192 | 0.244300 | -0.076275 | 0.039289 | 0.066942 | 0.003147 | 0.165227 | 0.021525 | -0.007817 |
| 1stFlrSF | 0.461239 | 0.456417 | 0.338733 | 0.243793 | 0.446875 | 0.470077 | 0.073346 | 0.275569 | 0.784538 | 1.000000 | -0.298222 | -0.011519 | 0.560631 | 0.278594 | 0.019899 | 0.365945 | -0.088929 | 0.090188 | 0.084255 | 0.374373 | 0.404611 | 0.284615 | 0.441707 | 0.494192 | 0.219573 | 0.263813 | -0.066071 | 0.028680 | 0.107902 | 0.112558 | 0.181387 | 0.048064 | -0.013566 |
| 2ndFlrSF | -0.036185 | -0.007862 | 0.025195 | 0.177177 | 0.063659 | -0.188952 | -0.095935 | -0.006196 | -0.238633 | -0.298222 | 1.000000 | -0.035688 | 0.618446 | -0.153136 | -0.095531 | 0.384515 | 0.613430 | 0.504458 | 0.079276 | 0.548464 | 0.143637 | 0.100355 | 0.181314 | 0.118704 | 0.087555 | 0.163780 | 0.048846 | -0.047828 | -0.018239 | -0.006163 | -0.022370 | -0.009415 | -0.010098 |
| LowQualFinSF | -0.037294 | -0.012457 | -0.101154 | -0.059973 | -0.045886 | -0.068178 | -0.023976 | 0.067212 | -0.013294 | -0.011519 | -0.035688 | 1.000000 | 0.050346 | -0.046815 | -0.020807 | -0.004986 | -0.053635 | 0.031993 | -0.008344 | 0.065386 | 0.008156 | -0.065324 | -0.038844 | -0.038496 | -0.005262 | -0.020197 | 0.115254 | -0.007149 | -0.013932 | -0.004606 | -0.007424 | 0.046473 | 0.026864 |
| GrLivArea | 0.357125 | 0.366324 | 0.290412 | 0.347946 | 0.416648 | 0.215692 | -0.025137 | 0.226780 | 0.435576 | 0.560631 | 0.618446 | 0.050346 | 1.000000 | 0.088789 | -0.069128 | 0.632701 | 0.453582 | 0.513831 | 0.137003 | 0.788012 | 0.456944 | 0.316098 | 0.515693 | 0.504555 | 0.255416 | 0.356366 | -0.001413 | -0.018561 | 0.071417 | 0.086542 | 0.128687 | 0.035472 | -0.017434 |
| BsmtFullBath | 0.127314 | 0.094052 | 0.234922 | 0.150371 | 0.198270 | 0.628903 | 0.166500 | -0.374659 | 0.343680 | 0.278594 | -0.153136 | -0.046815 | 0.088789 | 1.000000 | -0.150071 | 0.025620 | -0.035895 | -0.159455 | 0.006600 | -0.023118 | 0.201068 | 0.174664 | 0.189895 | 0.190121 | 0.196560 | 0.094315 | -0.085258 | 0.068396 | 0.081740 | 0.014860 | 0.009305 | 0.018309 | 0.023824 |
| BsmtHalfBath | -0.042779 | -0.008378 | -0.022947 | -0.076928 | 0.003992 | 0.088971 | 0.123715 | -0.117686 | 0.024722 | 0.019899 | -0.095531 | -0.020807 | -0.069128 | -0.150071 | 1.000000 | -0.040176 | -0.102009 | -0.006723 | -0.091839 | -0.074981 | 0.049807 | -0.041000 | -0.044934 | -0.018561 | 0.062275 | -0.044065 | -0.011173 | 0.018202 | 0.050842 | 0.127856 | 0.069801 | 0.015006 | 0.006073 |
| FullBath | 0.163078 | 0.147871 | 0.474028 | 0.477064 | 0.242522 | 0.104464 | -0.074858 | 0.257721 | 0.331947 | 0.365945 | 0.384515 | -0.004986 | 0.632701 | 0.025620 | -0.040176 | 1.000000 | 0.180297 | 0.349285 | 0.210972 | 0.500354 | 0.228681 | 0.506458 | 0.489982 | 0.411278 | 0.175053 | 0.260829 | -0.122930 | -0.012760 | -0.023736 | 0.000768 | -0.006936 | 0.037308 | 0.010283 |
| HalfBath | 0.023850 | 0.079581 | 0.296700 | 0.238807 | 0.182094 | -0.018966 | -0.032627 | -0.030584 | -0.062711 | -0.088929 | 0.613430 | -0.053635 | 0.453582 | -0.035895 | -0.102009 | 0.180297 | 1.000000 | 0.263638 | -0.015793 | 0.348590 | 0.207970 | 0.251409 | 0.249431 | 0.194229 | 0.125136 | 0.165246 | -0.069873 | -0.051598 | -0.000446 | -0.026345 | 0.046894 | 0.006309 | 0.013504 |
| BedroomAbvGr | 0.205100 | 0.181171 | -0.035923 | -0.004413 | 0.053259 | -0.119743 | -0.044951 | 0.199661 | 0.056093 | 0.090188 | 0.504458 | 0.031993 | 0.513831 | -0.159455 | -0.006723 | 0.349285 | 0.263638 | 1.000000 | 0.285674 | 0.664498 | 0.066133 | -0.028440 | 0.099297 | 0.082304 | 0.016902 | 0.079231 | 0.057772 | -0.085070 | -0.028374 | -0.007087 | -0.005398 | 0.064727 | -0.005113 |
| KitchenAbvGr | 0.016072 | -0.031830 | -0.098644 | -0.135940 | -0.066331 | -0.092190 | -0.034790 | 0.102069 | -0.007879 | 0.084255 | 0.079276 | -0.008344 | 0.137003 | 0.006600 | -0.091839 | 0.210972 | -0.015793 | 0.285674 | 1.000000 | 0.338219 | -0.091652 | -0.062182 | -0.023325 | -0.051080 | -0.084779 | -0.066172 | 0.018837 | -0.018113 | -0.061488 | -0.011669 | -0.005186 | 0.044159 | 0.038614 |
| TotRmsAbvGrd | 0.344366 | 0.289576 | 0.134839 | 0.203619 | 0.275533 | 0.060346 | -0.060361 | 0.243566 | 0.278408 | 0.374373 | 0.548464 | 0.065386 | 0.788012 | -0.023118 | -0.074981 | 0.500354 | 0.348590 | 0.664498 | 0.338219 | 1.000000 | 0.294427 | 0.176934 | 0.355386 | 0.320217 | 0.146832 | 0.244571 | 0.027953 | -0.059911 | 0.005290 | 0.055019 | 0.094063 | 0.050666 | -0.031627 |
| Fireplaces | 0.257037 | 0.282210 | 0.193597 | 0.153965 | 0.301575 | 0.326075 | 0.083655 | -0.043014 | 0.326100 | 0.404611 | 0.143637 | 0.008156 | 0.456944 | 0.201068 | 0.049807 | 0.228681 | 0.207970 | 0.066133 | -0.091652 | 0.294427 | 1.000000 | 0.128770 | 0.341988 | 0.320092 | 0.254528 | 0.149040 | 0.025176 | 0.028696 | 0.156343 | 0.105926 | 0.014802 | 0.016598 | 0.010002 |
| GarageYrBlt | 0.082069 | 0.018330 | 0.844150 | 0.661765 | 0.257439 | 0.232798 | -0.051545 | 0.153246 | 0.372405 | 0.284615 | 0.100355 | -0.065324 | 0.316098 | 0.174664 | -0.041000 | 0.506458 | 0.251409 | -0.028440 | -0.062182 | 0.176934 | 0.128770 | 1.000000 | 0.586649 | 0.548113 | 0.220850 | 0.235077 | -0.303646 | 0.016753 | -0.049821 | -0.015421 | 0.007926 | 0.040189 | -0.008451 |
| GarageCars | 0.336373 | 0.263398 | 0.538428 | 0.431442 | 0.358488 | 0.285959 | 0.005806 | 0.147005 | 0.441401 | 0.441707 | 0.181314 | -0.038844 | 0.515693 | 0.189895 | -0.044934 | 0.489982 | 0.249431 | 0.099297 | -0.023325 | 0.355386 | 0.341988 | 0.586649 | 1.000000 | 0.896674 | 0.254332 | 0.194292 | -0.116620 | 0.007189 | 0.036144 | 0.043302 | 0.002754 | 0.060845 | -0.007032 |
| GarageArea | 0.375581 | 0.315841 | 0.482497 | 0.382034 | 0.375182 | 0.323800 | 0.022391 | 0.145625 | 0.485558 | 0.494192 | 0.118704 | -0.038496 | 0.504555 | 0.190121 | -0.018561 | 0.411278 | 0.194229 | 0.082304 | -0.051080 | 0.320217 | 0.320092 | 0.548113 | 0.896674 | 1.000000 | 0.251051 | 0.224219 | -0.092776 | 0.022661 | 0.073089 | 0.043922 | 0.036352 | 0.052470 | 0.000536 |
| WoodDeckSF | 0.157426 | 0.158483 | 0.233889 | 0.230724 | 0.172721 | 0.242369 | 0.126032 | -0.073174 | 0.227192 | 0.219573 | 0.087555 | -0.005262 | 0.255416 | 0.196560 | 0.062275 | 0.175053 | 0.125136 | 0.016902 | -0.084779 | 0.146832 | 0.254528 | 0.220850 | 0.254332 | 0.251051 | 1.000000 | 0.019488 | -0.113036 | 0.036622 | -0.030682 | 0.123409 | 0.108898 | 0.014995 | -0.022818 |
| OpenPorchSF | 0.179795 | 0.164815 | 0.208040 | 0.258049 | 0.163666 | 0.136321 | -0.014185 | 0.111249 | 0.244300 | 0.263813 | 0.163780 | -0.020197 | 0.356366 | 0.094315 | -0.044065 | 0.260829 | 0.165246 | 0.079231 | -0.066172 | 0.244571 | 0.149040 | 0.235077 | 0.194292 | 0.224219 | 0.019488 | 1.000000 | -0.030918 | -0.013865 | 0.022233 | 0.070795 | 0.150404 | -0.000255 | -0.017122 |
| EnclosedPorch | 0.013340 | 0.099850 | -0.363012 | -0.243582 | -0.112814 | -0.097441 | 0.029010 | 0.012468 | -0.076275 | -0.066071 | 0.048846 | 0.115254 | -0.001413 | -0.085258 | -0.011173 | -0.122930 | -0.069873 | 0.057772 | 0.018837 | 0.027953 | 0.025176 | -0.303646 | -0.116620 | -0.092776 | -0.113036 | -0.030918 | 1.000000 | -0.027645 | -0.048550 | 0.142589 | 0.001353 | -0.012543 | 0.007616 |
| 3SsnPorch | -0.037487 | -0.001846 | -0.005442 | 0.025823 | 0.005772 | 0.088241 | -0.014473 | -0.046230 | 0.039289 | 0.028680 | -0.047828 | -0.007149 | -0.018561 | 0.068396 | 0.018202 | -0.012760 | -0.051598 | -0.085070 | -0.018113 | -0.059911 | 0.028696 | 0.016753 | 0.007189 | 0.022661 | 0.036622 | -0.013865 | -0.027645 | 1.000000 | -0.026785 | -0.005083 | -0.001242 | 0.022444 | 0.027818 |
| ScreenPorch | 0.113444 | 0.088712 | -0.031984 | -0.053761 | 0.069339 | 0.131414 | 0.039806 | -0.085111 | 0.066942 | 0.107902 | -0.018239 | -0.013932 | 0.071417 | 0.081740 | 0.050842 | -0.023736 | -0.000446 | -0.028374 | -0.061488 | 0.005290 | 0.156343 | -0.049821 | 0.036144 | 0.073089 | -0.030682 | 0.022233 | -0.048550 | -0.026785 | 1.000000 | -0.004897 | -0.012549 | 0.035212 | -0.023439 |
| PoolArea | 0.134232 | 0.140494 | -0.001060 | -0.034862 | -0.005395 | 0.012089 | 0.050152 | -0.029672 | 0.003147 | 0.112558 | -0.006163 | -0.004606 | 0.086542 | 0.014860 | 0.127856 | 0.000768 | -0.026345 | -0.007087 | -0.011669 | 0.055019 | 0.105926 | -0.015421 | 0.043302 | 0.043922 | 0.123409 | 0.070795 | 0.142589 | -0.005083 | -0.004897 | 1.000000 | -0.005279 | -0.055731 | -0.045185 |
| MiscVal | 0.068161 | 0.139071 | 0.007325 | 0.003011 | 0.105723 | 0.165403 | -0.012808 | 0.000320 | 0.165227 | 0.181387 | -0.022370 | -0.007424 | 0.128687 | 0.009305 | 0.069801 | -0.006936 | 0.046894 | -0.005398 | -0.005186 | 0.094063 | 0.014802 | 0.007926 | 0.002754 | 0.036352 | 0.108898 | 0.150404 | 0.001353 | -0.001242 | -0.012549 | -0.005279 | 1.000000 | 0.019369 | 0.011829 |
| MoSold | 0.008810 | 0.005152 | 0.015599 | 0.011771 | 0.005118 | 0.013397 | -0.003162 | 0.009132 | 0.021525 | 0.048064 | -0.009415 | 0.046473 | 0.035472 | 0.018309 | 0.015006 | 0.037308 | 0.006309 | 0.064727 | 0.044159 | 0.050666 | 0.016598 | 0.040189 | 0.060845 | 0.052470 | 0.014995 | -0.000255 | -0.012543 | 0.022444 | 0.035212 | -0.055731 | 0.019369 | 1.000000 | -0.163924 |
| YrSold | -0.025263 | -0.051144 | -0.011006 | 0.029715 | -0.029556 | 0.030779 | -0.011749 | -0.035214 | -0.007817 | -0.013566 | -0.010098 | 0.026864 | -0.017434 | 0.023824 | 0.006073 | 0.010283 | 0.013504 | -0.005113 | 0.038614 | -0.031627 | 0.010002 | -0.008451 | -0.007032 | 0.000536 | -0.022818 | -0.017122 | 0.007616 | 0.027818 | -0.023439 | -0.045185 | 0.011829 | -0.163924 | 1.000000 |
## Calculate variance for numeric columns.
def variance(x):
return(pd.DataFrame({'Datatype' : x.dtypes,
'Variance': [round(x[i].var()) for i in x]
}))
## Get variance for numeric columns of train data.
variance(num_columns)
| Datatype | Variance | |
|---|---|---|
| LotFrontage | float64 | 590 |
| LotArea | int64 | 99625650 |
| YearBuilt | int64 | 912 |
| YearRemodAdd | int64 | 426 |
| MasVnrArea | float64 | 32785 |
| BsmtFinSF1 | int64 | 208025 |
| BsmtFinSF2 | int64 | 26024 |
| BsmtUnfSF | int64 | 195246 |
| TotalBsmtSF | int64 | 192462 |
| 1stFlrSF | int64 | 149450 |
| 2ndFlrSF | int64 | 190557 |
| LowQualFinSF | int64 | 2364 |
| GrLivArea | int64 | 276130 |
| BsmtFullBath | int64 | 0 |
| BsmtHalfBath | int64 | 0 |
| FullBath | int64 | 0 |
| HalfBath | int64 | 0 |
| BedroomAbvGr | int64 | 1 |
| KitchenAbvGr | int64 | 0 |
| TotRmsAbvGrd | int64 | 3 |
| Fireplaces | int64 | 0 |
| GarageYrBlt | float64 | 610 |
| GarageCars | int64 | 1 |
| GarageArea | int64 | 45713 |
| WoodDeckSF | int64 | 15710 |
| OpenPorchSF | int64 | 4390 |
| EnclosedPorch | int64 | 3736 |
| 3SsnPorch | int64 | 860 |
| ScreenPorch | int64 | 3109 |
| PoolArea | int64 | 1614 |
| MiscVal | int64 | 246138 |
| MoSold | int64 | 7 |
| YrSold | int64 | 2 |
| SalePrice | int64 | 6311111264 |
## Get variance for numeric columns of test data.
variance(test_num_columns)
| Datatype | Variance | |
|---|---|---|
| LotFrontage | float64 | 501 |
| LotArea | int64 | 24557152 |
| YearBuilt | int64 | 924 |
| YearRemodAdd | int64 | 446 |
| MasVnrArea | float64 | 31551 |
| BsmtFinSF1 | float64 | 207269 |
| BsmtFinSF2 | float64 | 31242 |
| BsmtUnfSF | float64 | 191197 |
| TotalBsmtSF | float64 | 196159 |
| 1stFlrSF | int64 | 158536 |
| 2ndFlrSF | int64 | 176913 |
| LowQualFinSF | int64 | 1940 |
| GrLivArea | int64 | 235774 |
| BsmtFullBath | float64 | 0 |
| BsmtHalfBath | float64 | 0 |
| FullBath | int64 | 0 |
| HalfBath | int64 | 0 |
| BedroomAbvGr | int64 | 1 |
| KitchenAbvGr | int64 | 0 |
| TotRmsAbvGrd | int64 | 2 |
| Fireplaces | int64 | 0 |
| GarageYrBlt | float64 | 699 |
| GarageCars | float64 | 1 |
| GarageArea | float64 | 47110 |
| WoodDeckSF | int64 | 16319 |
| OpenPorchSF | int64 | 4745 |
| EnclosedPorch | int64 | 4520 |
| 3SsnPorch | int64 | 408 |
| ScreenPorch | int64 | 3205 |
| PoolArea | int64 | 930 |
| MiscVal | int64 | 397917 |
| MoSold | int64 | 7 |
| YrSold | int64 | 2 |
## Drop zero variance variable from train data set.
cols = ['BsmtFullBath','BsmtHalfBath','FullBath','HalfBath','KitchenAbvGr','Fireplaces']
data = data.drop(cols,axis=1)
num_columns = num_columns.drop(cols,axis=1)
## Drop zero variance variable from test data set.
test_data = test_data.drop(cols,axis=1)
test_num_columns = test_num_columns.drop(cols,axis=1)
## Get first record of train data after dropping few columns.
data[:1]
| MSSubClass | MSZoning | LotFrontage | LotArea | Street | Alley | LotShape | LandContour | Utilities | LotConfig | LandSlope | Neighborhood | Condition1 | Condition2 | BldgType | HouseStyle | OverallQual | OverallCond | YearBuilt | YearRemodAdd | RoofStyle | RoofMatl | Exterior1st | Exterior2nd | MasVnrType | MasVnrArea | ExterQual | ExterCond | Foundation | BsmtQual | BsmtCond | BsmtExposure | BsmtFinType1 | BsmtFinSF1 | BsmtFinType2 | BsmtFinSF2 | BsmtUnfSF | TotalBsmtSF | Heating | HeatingQC | CentralAir | Electrical | 1stFlrSF | 2ndFlrSF | LowQualFinSF | GrLivArea | BedroomAbvGr | KitchenQual | TotRmsAbvGrd | Functional | FireplaceQu | GarageType | GarageYrBlt | GarageFinish | GarageCars | GarageArea | GarageQual | GarageCond | PavedDrive | WoodDeckSF | OpenPorchSF | EnclosedPorch | 3SsnPorch | ScreenPorch | PoolArea | PoolQC | Fence | MiscFeature | MiscVal | MoSold | YrSold | SaleType | SaleCondition | SalePrice | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Id | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| 1 | 60 | RL | 65.0 | 8450 | Pave | NAA | Reg | Lvl | AllPub | Inside | Gtl | CollgCr | Norm | Norm | 1Fam | 2Story | 7 | 5 | 2003 | 2003 | Gable | CompShg | VinylSd | VinylSd | BrkFace | 196.0 | Gd | TA | PConc | Gd | TA | No | GLQ | 706 | Unf | 0 | 150 | 856 | GasA | Ex | Y | SBrkr | 856 | 854 | 0 | 1710 | 3 | Gd | 8 | Typ | NF | Attchd | 2003.0 | RFn | 2 | 548 | TA | TA | Y | 0 | 61 | 0 | 0 | 0 | 0 | NP | NF | NE | 0 | 2 | 2008 | WD | Normal | 208500 |
## Get first record of test data after dropping few columns.
test_data[:1]
| MSSubClass | MSZoning | LotFrontage | LotArea | Street | Alley | LotShape | LandContour | Utilities | LotConfig | LandSlope | Neighborhood | Condition1 | Condition2 | BldgType | HouseStyle | OverallQual | OverallCond | YearBuilt | YearRemodAdd | RoofStyle | RoofMatl | Exterior1st | Exterior2nd | MasVnrType | MasVnrArea | ExterQual | ExterCond | Foundation | BsmtQual | BsmtCond | BsmtExposure | BsmtFinType1 | BsmtFinSF1 | BsmtFinType2 | BsmtFinSF2 | BsmtUnfSF | TotalBsmtSF | Heating | HeatingQC | CentralAir | Electrical | 1stFlrSF | 2ndFlrSF | LowQualFinSF | GrLivArea | BedroomAbvGr | KitchenQual | TotRmsAbvGrd | Functional | FireplaceQu | GarageType | GarageYrBlt | GarageFinish | GarageCars | GarageArea | GarageQual | GarageCond | PavedDrive | WoodDeckSF | OpenPorchSF | EnclosedPorch | 3SsnPorch | ScreenPorch | PoolArea | PoolQC | Fence | MiscFeature | MiscVal | MoSold | YrSold | SaleType | SaleCondition | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Id | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| 1461 | 20 | RH | 80.0 | 11622 | Pave | NAA | Reg | Lvl | AllPub | Inside | Gtl | NAmes | Feedr | Norm | 1Fam | 1Story | 5 | 6 | 1961 | 1961 | Gable | CompShg | VinylSd | VinylSd | None | 0.0 | TA | TA | CBlock | TA | TA | No | Rec | 468.0 | LwQ | 144.0 | 270.0 | 882.0 | GasA | TA | Y | SBrkr | 896 | 0 | 0 | 896 | 2 | TA | 5 | Typ | NF | Attchd | 1961.0 | Unf | 1.0 | 730.0 | TA | TA | Y | 140 | 0 | 0 | 0 | 120 | 0 | NP | MnPrv | NE | 0 | 6 | 2010 | WD | Normal |
## Seperate Target and Predictors and display dimensions.
features = data.drop('SalePrice', axis = 1)
print(features.shape)
target = data['SalePrice']
print(target.shape)
(1460, 73) (1460,)
## Display dimesnions of test data.
test_data.shape
(1459, 73)
## Split data into train and validation.
X_train,X_test,y_train,y_test=train_test_split(features,target,test_size=0.3,random_state=123)
## Seperate category and numeric columns for train data.
catcols_train = X_train.select_dtypes(include=['object','category'])
numcols_train = X_train.select_dtypes(include=['int64', 'float64'])
## Seperate category and numeric columns for test data.
test_catcols = test_data.select_dtypes(include=['object','category'])
test_numcols = test_data.select_dtypes(include=['int64', 'float64'])
## Display dimensions and column names of category and numeric columns of train data.
print(catcols_train.shape)
print(catcols_train.columns)
print(numcols_train.shape)
print(numcols_train.columns)
(1022, 46)
Index(['MSSubClass', 'MSZoning', 'Street', 'Alley', 'LotShape', 'LandContour',
'Utilities', 'LotConfig', 'LandSlope', 'Neighborhood', 'Condition1',
'Condition2', 'BldgType', 'HouseStyle', 'OverallQual', 'OverallCond',
'RoofStyle', 'RoofMatl', 'Exterior1st', 'Exterior2nd', 'MasVnrType',
'ExterQual', 'ExterCond', 'Foundation', 'BsmtQual', 'BsmtCond',
'BsmtExposure', 'BsmtFinType1', 'BsmtFinType2', 'Heating', 'HeatingQC',
'CentralAir', 'Electrical', 'KitchenQual', 'Functional', 'FireplaceQu',
'GarageType', 'GarageFinish', 'GarageQual', 'GarageCond', 'PavedDrive',
'PoolQC', 'Fence', 'MiscFeature', 'SaleType', 'SaleCondition'],
dtype='object')
(1022, 27)
Index(['LotFrontage', 'LotArea', 'YearBuilt', 'YearRemodAdd', 'MasVnrArea',
'BsmtFinSF1', 'BsmtFinSF2', 'BsmtUnfSF', 'TotalBsmtSF', '1stFlrSF',
'2ndFlrSF', 'LowQualFinSF', 'GrLivArea', 'BedroomAbvGr', 'TotRmsAbvGrd',
'GarageYrBlt', 'GarageCars', 'GarageArea', 'WoodDeckSF', 'OpenPorchSF',
'EnclosedPorch', '3SsnPorch', 'ScreenPorch', 'PoolArea', 'MiscVal',
'MoSold', 'YrSold'],
dtype='object')
## Display dimensions and column names of category and numeric columns of validation data.
print(test_catcols.shape)
print(test_catcols.columns)
print(test_numcols.shape)
print(test_numcols.columns)
(1459, 46)
Index(['MSSubClass', 'MSZoning', 'Street', 'Alley', 'LotShape', 'LandContour',
'Utilities', 'LotConfig', 'LandSlope', 'Neighborhood', 'Condition1',
'Condition2', 'BldgType', 'HouseStyle', 'OverallQual', 'OverallCond',
'RoofStyle', 'RoofMatl', 'Exterior1st', 'Exterior2nd', 'MasVnrType',
'ExterQual', 'ExterCond', 'Foundation', 'BsmtQual', 'BsmtCond',
'BsmtExposure', 'BsmtFinType1', 'BsmtFinType2', 'Heating', 'HeatingQC',
'CentralAir', 'Electrical', 'KitchenQual', 'Functional', 'FireplaceQu',
'GarageType', 'GarageFinish', 'GarageQual', 'GarageCond', 'PavedDrive',
'PoolQC', 'Fence', 'MiscFeature', 'SaleType', 'SaleCondition'],
dtype='object')
(1459, 27)
Index(['LotFrontage', 'LotArea', 'YearBuilt', 'YearRemodAdd', 'MasVnrArea',
'BsmtFinSF1', 'BsmtFinSF2', 'BsmtUnfSF', 'TotalBsmtSF', '1stFlrSF',
'2ndFlrSF', 'LowQualFinSF', 'GrLivArea', 'BedroomAbvGr', 'TotRmsAbvGrd',
'GarageYrBlt', 'GarageCars', 'GarageArea', 'WoodDeckSF', 'OpenPorchSF',
'EnclosedPorch', '3SsnPorch', 'ScreenPorch', 'PoolArea', 'MiscVal',
'MoSold', 'YrSold'],
dtype='object')
## Seperate category and numeric columns from test data.
catcols_test = X_test.select_dtypes(include=['object','category'])
numcols_test = X_test.select_dtypes(include=['int64', 'float64'])
## Display dimesniosn of test data.
catcols_test.shape
(438, 46)
################################################### Imputation ###############################################################
## Import imputer,scaler libraries for imputing null values.
from sklearn.impute import SimpleImputer
## Instantiate numeric ,category imputers.
num_imputer = SimpleImputer(strategy = 'median')
cat_imputer = SimpleImputer(strategy = 'most_frequent')
## Fit numeric imputer.
num_imputer.fit(numcols_train)
## Impute numeric columns NA values of train data and prepare data frame.
X_train_imp = num_imputer.transform(numcols_train)
X_train_imp =pd.DataFrame(X_train_imp,columns=numcols_train.columns)
## Check NA values for numeric columns of train data after imputing.
X_train_imp.isna().sum()
LotFrontage 0 LotArea 0 YearBuilt 0 YearRemodAdd 0 MasVnrArea 0 BsmtFinSF1 0 BsmtFinSF2 0 BsmtUnfSF 0 TotalBsmtSF 0 1stFlrSF 0 2ndFlrSF 0 LowQualFinSF 0 GrLivArea 0 BedroomAbvGr 0 TotRmsAbvGrd 0 GarageYrBlt 0 GarageCars 0 GarageArea 0 WoodDeckSF 0 OpenPorchSF 0 EnclosedPorch 0 3SsnPorch 0 ScreenPorch 0 PoolArea 0 MiscVal 0 MoSold 0 YrSold 0 dtype: int64
## Check first 5 records of numeric column of train data.
X_train_imp.head()
| LotFrontage | LotArea | YearBuilt | YearRemodAdd | MasVnrArea | BsmtFinSF1 | BsmtFinSF2 | BsmtUnfSF | TotalBsmtSF | 1stFlrSF | 2ndFlrSF | LowQualFinSF | GrLivArea | BedroomAbvGr | TotRmsAbvGrd | GarageYrBlt | GarageCars | GarageArea | WoodDeckSF | OpenPorchSF | EnclosedPorch | 3SsnPorch | ScreenPorch | PoolArea | MiscVal | MoSold | YrSold | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 57.0 | 8846.0 | 1996.0 | 1996.0 | 0.0 | 298.0 | 0.0 | 572.0 | 870.0 | 914.0 | 0.0 | 0.0 | 914.0 | 2.0 | 5.0 | 1998.0 | 2.0 | 576.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 7.0 | 2006.0 |
| 1 | 55.0 | 5350.0 | 1940.0 | 1966.0 | 0.0 | 0.0 | 0.0 | 728.0 | 728.0 | 1306.0 | 0.0 | 0.0 | 1306.0 | 3.0 | 6.0 | 1979.0 | 0.0 | 0.0 | 263.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 450.0 | 5.0 | 2010.0 |
| 2 | 70.0 | 8521.0 | 1967.0 | 1967.0 | 0.0 | 842.0 | 0.0 | 70.0 | 912.0 | 912.0 | 0.0 | 0.0 | 912.0 | 3.0 | 5.0 | 1974.0 | 1.0 | 336.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 5.0 | 2010.0 |
| 3 | 84.0 | 8658.0 | 1965.0 | 1965.0 | 101.0 | 643.0 | 0.0 | 445.0 | 1088.0 | 1324.0 | 0.0 | 0.0 | 1324.0 | 3.0 | 6.0 | 1965.0 | 2.0 | 440.0 | 0.0 | 138.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 12.0 | 2006.0 |
| 4 | 64.0 | 6762.0 | 2007.0 | 2007.0 | 108.0 | 664.0 | 0.0 | 544.0 | 1208.0 | 1208.0 | 0.0 | 0.0 | 1208.0 | 2.0 | 6.0 | 2007.0 | 2.0 | 628.0 | 105.0 | 54.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 9.0 | 2007.0 |
## Fit category imputer.
cat_imputer.fit(catcols_train)
## Impute NA values for category columns of train data and prepares a dataframe.
X_train_imp_cat = cat_imputer.transform(catcols_train)
X_train_imp_cat = pd.DataFrame(X_train_imp_cat,columns=catcols_train.columns)
## Check dimesnions of category columns of train data.
X_train_imp_cat.shape
(1022, 46)
## Check NA values for category columns of train data after imputation.
X_train_imp_cat.isna().sum()
MSSubClass 0 MSZoning 0 Street 0 Alley 0 LotShape 0 LandContour 0 Utilities 0 LotConfig 0 LandSlope 0 Neighborhood 0 Condition1 0 Condition2 0 BldgType 0 HouseStyle 0 OverallQual 0 OverallCond 0 RoofStyle 0 RoofMatl 0 Exterior1st 0 Exterior2nd 0 MasVnrType 0 ExterQual 0 ExterCond 0 Foundation 0 BsmtQual 0 BsmtCond 0 BsmtExposure 0 BsmtFinType1 0 BsmtFinType2 0 Heating 0 HeatingQC 0 CentralAir 0 Electrical 0 KitchenQual 0 Functional 0 FireplaceQu 0 GarageType 0 GarageFinish 0 GarageQual 0 GarageCond 0 PavedDrive 0 PoolQC 0 Fence 0 MiscFeature 0 SaleType 0 SaleCondition 0 dtype: int64
## Check first 5 records of category columns of train data.
X_train_imp_cat.head()
| MSSubClass | MSZoning | Street | Alley | LotShape | LandContour | Utilities | LotConfig | LandSlope | Neighborhood | Condition1 | Condition2 | BldgType | HouseStyle | OverallQual | OverallCond | RoofStyle | RoofMatl | Exterior1st | Exterior2nd | MasVnrType | ExterQual | ExterCond | Foundation | BsmtQual | BsmtCond | BsmtExposure | BsmtFinType1 | BsmtFinType2 | Heating | HeatingQC | CentralAir | Electrical | KitchenQual | Functional | FireplaceQu | GarageType | GarageFinish | GarageQual | GarageCond | PavedDrive | PoolQC | Fence | MiscFeature | SaleType | SaleCondition | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 85 | RL | Pave | NAA | IR1 | Lvl | AllPub | CulDSac | Gtl | CollgCr | Norm | Norm | 1Fam | SFoyer | 5 | 5 | Gable | CompShg | VinylSd | VinylSd | None | Gd | TA | PConc | Gd | TA | Av | GLQ | Unf | GasA | Ex | Y | SBrkr | TA | Typ | NF | Detchd | Unf | TA | TA | Y | NP | NF | NE | WD | Normal |
| 1 | 30 | RL | Pave | NAA | IR1 | Lvl | AllPub | Inside | Gtl | BrkSide | Norm | Norm | 1Fam | 1Story | 3 | 2 | Gable | CompShg | Wd Sdng | Plywood | None | TA | Po | CBlock | TA | TA | No | Unf | Unf | GasA | Ex | Y | SBrkr | Fa | Mod | NF | NG | NG | NG | NG | Y | NP | GdWo | Shed | WD | Normal |
| 2 | 20 | RL | Pave | NAA | Reg | Lvl | AllPub | FR2 | Gtl | Sawyer | Feedr | Norm | 1Fam | 1Story | 5 | 5 | Gable | CompShg | HdBoard | HdBoard | None | TA | TA | CBlock | TA | TA | No | ALQ | Unf | GasA | TA | Y | SBrkr | TA | Typ | Fa | Detchd | Unf | TA | TA | Y | NP | MnPrv | NE | WD | Normal |
| 3 | 20 | RL | Pave | NAA | Reg | Lvl | AllPub | Inside | Gtl | NAmes | Norm | Norm | 1Fam | 1Story | 6 | 5 | Gable | CompShg | Wd Sdng | Wd Sdng | BrkFace | TA | TA | CBlock | TA | TA | No | Rec | Unf | GasA | Ex | Y | SBrkr | TA | Typ | TA | Attchd | RFn | TA | TA | Y | NP | GdWo | NE | WD | Abnorml |
| 4 | 20 | RL | Pave | NAA | Reg | Lvl | AllPub | Inside | Gtl | CollgCr | Norm | Norm | 1Fam | 1Story | 7 | 5 | Gable | CompShg | VinylSd | VinylSd | BrkFace | Gd | TA | PConc | Gd | TA | No | GLQ | Unf | GasA | Ex | Y | SBrkr | Gd | Typ | NF | Attchd | RFn | TA | TA | Y | NP | NF | NE | New | Partial |
## Impute NA values for numeric columns of validation data and prepares a dataframe,display first 5 records.
X_test_imp = num_imputer.transform(numcols_test)
X_test_imp =pd.DataFrame(X_test_imp,columns=numcols_test.columns)
X_test_imp.head()
| LotFrontage | LotArea | YearBuilt | YearRemodAdd | MasVnrArea | BsmtFinSF1 | BsmtFinSF2 | BsmtUnfSF | TotalBsmtSF | 1stFlrSF | 2ndFlrSF | LowQualFinSF | GrLivArea | BedroomAbvGr | TotRmsAbvGrd | GarageYrBlt | GarageCars | GarageArea | WoodDeckSF | OpenPorchSF | EnclosedPorch | 3SsnPorch | ScreenPorch | PoolArea | MiscVal | MoSold | YrSold | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 68.0 | 9505.0 | 2001.0 | 2001.0 | 180.0 | 0.0 | 0.0 | 884.0 | 884.0 | 884.0 | 1151.0 | 0.0 | 2035.0 | 3.0 | 8.0 | 2001.0 | 2.0 | 434.0 | 144.0 | 48.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 5.0 | 2010.0 |
| 1 | 60.0 | 9600.0 | 1900.0 | 1950.0 | 0.0 | 0.0 | 0.0 | 1095.0 | 1095.0 | 1095.0 | 679.0 | 0.0 | 1774.0 | 4.0 | 8.0 | 1920.0 | 3.0 | 779.0 | 0.0 | 0.0 | 90.0 | 0.0 | 0.0 | 0.0 | 0.0 | 5.0 | 2006.0 |
| 2 | 32.0 | 3363.0 | 2004.0 | 2004.0 | 117.0 | 0.0 | 0.0 | 976.0 | 976.0 | 976.0 | 732.0 | 0.0 | 1708.0 | 3.0 | 7.0 | 2004.0 | 2.0 | 380.0 | 0.0 | 40.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 4.0 | 2006.0 |
| 3 | 75.0 | 9750.0 | 1998.0 | 1998.0 | 0.0 | 975.0 | 0.0 | 133.0 | 1108.0 | 1108.0 | 989.0 | 0.0 | 2097.0 | 3.0 | 8.0 | 1998.0 | 2.0 | 583.0 | 253.0 | 170.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 6.0 | 2006.0 |
| 4 | 60.0 | 10930.0 | 1945.0 | 1950.0 | 0.0 | 580.0 | 0.0 | 333.0 | 913.0 | 1048.0 | 510.0 | 0.0 | 1558.0 | 3.0 | 6.0 | 1962.0 | 1.0 | 288.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 4.0 | 2008.0 |
## Check NA values for numeric columns of validation data after imputing.
X_test_imp.isna().sum()
LotFrontage 0 LotArea 0 YearBuilt 0 YearRemodAdd 0 MasVnrArea 0 BsmtFinSF1 0 BsmtFinSF2 0 BsmtUnfSF 0 TotalBsmtSF 0 1stFlrSF 0 2ndFlrSF 0 LowQualFinSF 0 GrLivArea 0 BedroomAbvGr 0 TotRmsAbvGrd 0 GarageYrBlt 0 GarageCars 0 GarageArea 0 WoodDeckSF 0 OpenPorchSF 0 EnclosedPorch 0 3SsnPorch 0 ScreenPorch 0 PoolArea 0 MiscVal 0 MoSold 0 YrSold 0 dtype: int64
## Impute NA values for category columns of validation data and prepares a dataframe and display first 5 records.
X_test_imp_cat = cat_imputer.transform(catcols_test)
X_test_imp_cat = pd.DataFrame(X_test_imp_cat,columns=catcols_test.columns)
X_test_imp_cat.head()
| MSSubClass | MSZoning | Street | Alley | LotShape | LandContour | Utilities | LotConfig | LandSlope | Neighborhood | Condition1 | Condition2 | BldgType | HouseStyle | OverallQual | OverallCond | RoofStyle | RoofMatl | Exterior1st | Exterior2nd | MasVnrType | ExterQual | ExterCond | Foundation | BsmtQual | BsmtCond | BsmtExposure | BsmtFinType1 | BsmtFinType2 | Heating | HeatingQC | CentralAir | Electrical | KitchenQual | Functional | FireplaceQu | GarageType | GarageFinish | GarageQual | GarageCond | PavedDrive | PoolQC | Fence | MiscFeature | SaleType | SaleCondition | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 60 | RL | Pave | NAA | IR1 | Lvl | AllPub | CulDSac | Gtl | Gilbert | Norm | Norm | 1Fam | 2Story | 7 | 5 | Gable | CompShg | VinylSd | VinylSd | BrkFace | Gd | TA | PConc | Gd | TA | No | Unf | Unf | GasA | Ex | Y | SBrkr | Gd | Typ | Gd | BuiltIn | Fin | TA | TA | Y | NP | NF | NE | WD | Normal |
| 1 | 70 | RM | Pave | Grvl | Reg | Lvl | AllPub | Inside | Gtl | OldTown | Norm | Norm | 1Fam | 2Story | 4 | 2 | Gable | CompShg | AsbShng | Stucco | None | TA | TA | BrkTil | TA | Fa | No | Unf | Unf | GasW | Fa | N | SBrkr | TA | Min2 | NF | 2Types | Unf | Fa | Fa | N | NP | NF | NE | WD | Normal |
| 2 | 160 | RM | Pave | NAA | Reg | Lvl | AllPub | Inside | Gtl | Edwards | Norm | Norm | TwnhsE | 2Story | 7 | 5 | Gable | CompShg | VinylSd | VinylSd | Stone | Gd | TA | PConc | Gd | TA | No | Unf | Unf | GasA | Ex | Y | SBrkr | Gd | Maj1 | NF | Detchd | Unf | TA | TA | Y | NP | NF | NE | WD | Normal |
| 3 | 60 | RL | Pave | NAA | Reg | Lvl | AllPub | Corner | Gtl | CollgCr | Norm | Norm | 1Fam | 2Story | 7 | 6 | Gable | CompShg | VinylSd | VinylSd | None | TA | TA | PConc | Gd | TA | Av | GLQ | Unf | GasA | Ex | Y | SBrkr | Gd | Typ | TA | Detchd | RFn | TA | TA | Y | NP | NF | NE | WD | Normal |
| 4 | 50 | RL | Pave | Grvl | Reg | Bnk | AllPub | Inside | Gtl | NAmes | Artery | Norm | 1Fam | 1.5Fin | 5 | 6 | Gable | CompShg | MetalSd | MetalSd | None | TA | TA | CBlock | TA | TA | No | BLQ | Unf | GasA | TA | Y | FuseA | TA | Typ | TA | Attchd | Unf | TA | TA | Y | NP | NF | NE | WD | Normal |
## Display dimensions of category columns of validation data.
X_test_imp_cat.shape
(438, 46)
## Check NA values for category columns of validation data after imputation.
X_test_imp_cat.isna().sum()
MSSubClass 0 MSZoning 0 Street 0 Alley 0 LotShape 0 LandContour 0 Utilities 0 LotConfig 0 LandSlope 0 Neighborhood 0 Condition1 0 Condition2 0 BldgType 0 HouseStyle 0 OverallQual 0 OverallCond 0 RoofStyle 0 RoofMatl 0 Exterior1st 0 Exterior2nd 0 MasVnrType 0 ExterQual 0 ExterCond 0 Foundation 0 BsmtQual 0 BsmtCond 0 BsmtExposure 0 BsmtFinType1 0 BsmtFinType2 0 Heating 0 HeatingQC 0 CentralAir 0 Electrical 0 KitchenQual 0 Functional 0 FireplaceQu 0 GarageType 0 GarageFinish 0 GarageQual 0 GarageCond 0 PavedDrive 0 PoolQC 0 Fence 0 MiscFeature 0 SaleType 0 SaleCondition 0 dtype: int64
## Impute numeric columns NA values of test data and prepare data frame,display first 5 records.
test_imp = num_imputer.transform(test_numcols)
test_imp =pd.DataFrame(test_imp,columns=test_numcols.columns)
test_imp.head()
| LotFrontage | LotArea | YearBuilt | YearRemodAdd | MasVnrArea | BsmtFinSF1 | BsmtFinSF2 | BsmtUnfSF | TotalBsmtSF | 1stFlrSF | 2ndFlrSF | LowQualFinSF | GrLivArea | BedroomAbvGr | TotRmsAbvGrd | GarageYrBlt | GarageCars | GarageArea | WoodDeckSF | OpenPorchSF | EnclosedPorch | 3SsnPorch | ScreenPorch | PoolArea | MiscVal | MoSold | YrSold | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 80.0 | 11622.0 | 1961.0 | 1961.0 | 0.0 | 468.0 | 144.0 | 270.0 | 882.0 | 896.0 | 0.0 | 0.0 | 896.0 | 2.0 | 5.0 | 1961.0 | 1.0 | 730.0 | 140.0 | 0.0 | 0.0 | 0.0 | 120.0 | 0.0 | 0.0 | 6.0 | 2010.0 |
| 1 | 81.0 | 14267.0 | 1958.0 | 1958.0 | 108.0 | 923.0 | 0.0 | 406.0 | 1329.0 | 1329.0 | 0.0 | 0.0 | 1329.0 | 3.0 | 6.0 | 1958.0 | 1.0 | 312.0 | 393.0 | 36.0 | 0.0 | 0.0 | 0.0 | 0.0 | 12500.0 | 6.0 | 2010.0 |
| 2 | 74.0 | 13830.0 | 1997.0 | 1998.0 | 0.0 | 791.0 | 0.0 | 137.0 | 928.0 | 928.0 | 701.0 | 0.0 | 1629.0 | 3.0 | 6.0 | 1997.0 | 2.0 | 482.0 | 212.0 | 34.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 3.0 | 2010.0 |
| 3 | 78.0 | 9978.0 | 1998.0 | 1998.0 | 20.0 | 602.0 | 0.0 | 324.0 | 926.0 | 926.0 | 678.0 | 0.0 | 1604.0 | 3.0 | 7.0 | 1998.0 | 2.0 | 470.0 | 360.0 | 36.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 6.0 | 2010.0 |
| 4 | 43.0 | 5005.0 | 1992.0 | 1992.0 | 0.0 | 263.0 | 0.0 | 1017.0 | 1280.0 | 1280.0 | 0.0 | 0.0 | 1280.0 | 2.0 | 5.0 | 1992.0 | 2.0 | 506.0 | 0.0 | 82.0 | 0.0 | 0.0 | 144.0 | 0.0 | 0.0 | 1.0 | 2010.0 |
## Check NA values for numeric columns of test data after imputing.
test_imp.isna().sum()
LotFrontage 0 LotArea 0 YearBuilt 0 YearRemodAdd 0 MasVnrArea 0 BsmtFinSF1 0 BsmtFinSF2 0 BsmtUnfSF 0 TotalBsmtSF 0 1stFlrSF 0 2ndFlrSF 0 LowQualFinSF 0 GrLivArea 0 BedroomAbvGr 0 TotRmsAbvGrd 0 GarageYrBlt 0 GarageCars 0 GarageArea 0 WoodDeckSF 0 OpenPorchSF 0 EnclosedPorch 0 3SsnPorch 0 ScreenPorch 0 PoolArea 0 MiscVal 0 MoSold 0 YrSold 0 dtype: int64
## Impute NA values for category columns of test data and prepares a dataframe,display first 5 records.
test_imp_cat = cat_imputer.transform(test_catcols)
test_imp_cat = pd.DataFrame(test_imp_cat,columns=test_catcols.columns)
test_imp_cat.head()
| MSSubClass | MSZoning | Street | Alley | LotShape | LandContour | Utilities | LotConfig | LandSlope | Neighborhood | Condition1 | Condition2 | BldgType | HouseStyle | OverallQual | OverallCond | RoofStyle | RoofMatl | Exterior1st | Exterior2nd | MasVnrType | ExterQual | ExterCond | Foundation | BsmtQual | BsmtCond | BsmtExposure | BsmtFinType1 | BsmtFinType2 | Heating | HeatingQC | CentralAir | Electrical | KitchenQual | Functional | FireplaceQu | GarageType | GarageFinish | GarageQual | GarageCond | PavedDrive | PoolQC | Fence | MiscFeature | SaleType | SaleCondition | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 20 | RH | Pave | NAA | Reg | Lvl | AllPub | Inside | Gtl | NAmes | Feedr | Norm | 1Fam | 1Story | 5 | 6 | Gable | CompShg | VinylSd | VinylSd | None | TA | TA | CBlock | TA | TA | No | Rec | LwQ | GasA | TA | Y | SBrkr | TA | Typ | NF | Attchd | Unf | TA | TA | Y | NP | MnPrv | NE | WD | Normal |
| 1 | 20 | RL | Pave | NAA | IR1 | Lvl | AllPub | Corner | Gtl | NAmes | Norm | Norm | 1Fam | 1Story | 6 | 6 | Hip | CompShg | Wd Sdng | Wd Sdng | BrkFace | TA | TA | CBlock | TA | TA | No | ALQ | Unf | GasA | TA | Y | SBrkr | Gd | Typ | NF | Attchd | Unf | TA | TA | Y | NP | NF | Gar2 | WD | Normal |
| 2 | 60 | RL | Pave | NAA | IR1 | Lvl | AllPub | Inside | Gtl | Gilbert | Norm | Norm | 1Fam | 2Story | 5 | 5 | Gable | CompShg | VinylSd | VinylSd | None | TA | TA | PConc | Gd | TA | No | GLQ | Unf | GasA | Gd | Y | SBrkr | TA | Typ | TA | Attchd | Fin | TA | TA | Y | NP | MnPrv | NE | WD | Normal |
| 3 | 60 | RL | Pave | NAA | IR1 | Lvl | AllPub | Inside | Gtl | Gilbert | Norm | Norm | 1Fam | 2Story | 6 | 6 | Gable | CompShg | VinylSd | VinylSd | BrkFace | TA | TA | PConc | TA | TA | No | GLQ | Unf | GasA | Ex | Y | SBrkr | Gd | Typ | Gd | Attchd | Fin | TA | TA | Y | NP | NF | NE | WD | Normal |
| 4 | 120 | RL | Pave | NAA | IR1 | HLS | AllPub | Inside | Gtl | StoneBr | Norm | Norm | TwnhsE | 1Story | 8 | 5 | Gable | CompShg | HdBoard | HdBoard | None | Gd | TA | PConc | Gd | TA | No | ALQ | Unf | GasA | Ex | Y | SBrkr | Gd | Typ | NF | Attchd | RFn | TA | TA | Y | NP | NF | NE | WD | Normal |
## Display dimensions of category columns of test data.
test_imp_cat.shape
(1459, 46)
## Check NA values for category columns of test data after imputation.
test_imp_cat.isna().sum()
MSSubClass 0 MSZoning 0 Street 0 Alley 0 LotShape 0 LandContour 0 Utilities 0 LotConfig 0 LandSlope 0 Neighborhood 0 Condition1 0 Condition2 0 BldgType 0 HouseStyle 0 OverallQual 0 OverallCond 0 RoofStyle 0 RoofMatl 0 Exterior1st 0 Exterior2nd 0 MasVnrType 0 ExterQual 0 ExterCond 0 Foundation 0 BsmtQual 0 BsmtCond 0 BsmtExposure 0 BsmtFinType1 0 BsmtFinType2 0 Heating 0 HeatingQC 0 CentralAir 0 Electrical 0 KitchenQual 0 Functional 0 FireplaceQu 0 GarageType 0 GarageFinish 0 GarageQual 0 GarageCond 0 PavedDrive 0 PoolQC 0 Fence 0 MiscFeature 0 SaleType 0 SaleCondition 0 dtype: int64
#################################################### Standardization ##########################################################
## Import Scaler library to scale the numeric values.
from sklearn.preprocessing import StandardScaler
## Instantiate scaler and fit a model.
scaler = StandardScaler()
scaler.fit(X_train_imp)
StandardScaler(copy=True, with_mean=True, with_std=True)
## Standardize numeric column values of train data,prepare a dataframe and display first 5 records.
X_train_scaler = scaler.transform(X_train_imp)
X_train_scaler = pd.DataFrame(X_train_scaler,columns=X_train_imp.columns)
X_train_scaler.head()
| LotFrontage | LotArea | YearBuilt | YearRemodAdd | MasVnrArea | BsmtFinSF1 | BsmtFinSF2 | BsmtUnfSF | TotalBsmtSF | 1stFlrSF | 2ndFlrSF | LowQualFinSF | GrLivArea | BedroomAbvGr | TotRmsAbvGrd | GarageYrBlt | GarageCars | GarageArea | WoodDeckSF | OpenPorchSF | EnclosedPorch | 3SsnPorch | ScreenPorch | PoolArea | MiscVal | MoSold | YrSold | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | -0.561119 | -0.150083 | 0.842112 | 0.534621 | -0.570667 | -0.301414 | -0.28229 | 0.009986 | -0.403595 | -0.628924 | -0.797828 | -0.1294 | -1.122539 | -1.03986 | -0.924643 | 0.824828 | 0.333696 | 0.518931 | -0.749421 | -0.685547 | -0.360803 | -0.112837 | -0.271032 | -0.069193 | -0.108754 | 0.232125 | -1.396198 |
| 1 | -0.653665 | -0.505027 | -0.986602 | -0.914670 | -0.570667 | -0.932224 | -0.28229 | 0.361595 | -0.714639 | 0.353300 | -0.797828 | -0.1294 | -0.394941 | 0.14623 | -0.319271 | 0.046080 | -2.330655 | -2.190237 | 1.284957 | -0.685547 | -0.360803 | -0.112837 | -0.271032 | -0.069193 | 1.343344 | -0.506915 | 1.618941 |
| 2 | 0.040432 | -0.183080 | -0.104901 | -0.866360 | -0.570667 | 0.850133 | -0.28229 | -1.121474 | -0.311597 | -0.633935 | -0.797828 | -0.1294 | -1.126251 | 0.14623 | -0.924643 | -0.158854 | -0.998480 | -0.609889 | -0.749421 | -0.685547 | -0.360803 | -0.112837 | -0.271032 | -0.069193 | -0.108754 | -0.506915 | 1.618941 |
| 3 | 0.688257 | -0.169171 | -0.170212 | -0.962979 | -0.009994 | 0.428887 | -0.28229 | -0.276260 | 0.073922 | 0.398402 | -0.797828 | -0.1294 | -0.361531 | 0.14623 | -0.319271 | -0.527734 | 0.333696 | -0.120733 | -0.749421 | 1.434220 | -0.360803 | -0.112837 | -0.271032 | -0.069193 | -0.108754 | 2.079725 | -1.396198 |
| 4 | -0.237207 | -0.361669 | 1.201324 | 1.066028 | 0.028864 | 0.473340 | -0.28229 | -0.053123 | 0.336776 | 0.107744 | -0.797828 | -0.1294 | -0.576840 | -1.03986 | -0.319271 | 1.193708 | 0.333696 | 0.763509 | 0.062783 | 0.143927 | -0.360803 | -0.112837 | -0.271032 | -0.069193 | -0.108754 | 0.971165 | -0.642413 |
## Standardize numeric column values of validation data,prepare a dataframe and display first 5 records.
X_test_scaler = scaler.transform(X_test_imp)
X_test_scaler = pd.DataFrame(X_test_scaler,columns=X_test_imp.columns)
X_test_scaler.head()
| LotFrontage | LotArea | YearBuilt | YearRemodAdd | MasVnrArea | BsmtFinSF1 | BsmtFinSF2 | BsmtUnfSF | TotalBsmtSF | 1stFlrSF | 2ndFlrSF | LowQualFinSF | GrLivArea | BedroomAbvGr | TotRmsAbvGrd | GarageYrBlt | GarageCars | GarageArea | WoodDeckSF | OpenPorchSF | EnclosedPorch | 3SsnPorch | ScreenPorch | PoolArea | MiscVal | MoSold | YrSold | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | -0.052114 | -0.083176 | 1.005390 | 0.776169 | 0.428552 | -0.932224 | -0.28229 | 0.713204 | -0.372929 | -0.704094 | 1.847518 | -0.1294 | 0.958169 | 0.146230 | 0.891472 | 0.947788 | 0.333696 | -0.148954 | 0.364459 | 0.051763 | -0.360803 | -0.112837 | -0.271032 | -0.069193 | -0.108754 | -0.506915 | 1.618941 |
| 1 | -0.422299 | -0.073531 | -2.292826 | -1.687625 | -0.570667 | -0.932224 | -0.28229 | 1.188778 | 0.089255 | -0.175397 | 0.762719 | -0.1294 | 0.473722 | 1.332321 | 0.891472 | -2.372137 | 1.665871 | 1.473725 | -0.749421 | -0.685547 | 1.088712 | -0.112837 | -0.271032 | -0.069193 | -0.108754 | -0.506915 | -1.396198 |
| 2 | -1.717949 | -0.706764 | 1.103357 | 0.921098 | 0.078825 | -0.932224 | -0.28229 | 0.920563 | -0.171408 | -0.473572 | 0.884529 | -0.1294 | 0.351219 | 0.146230 | 0.286100 | 1.070748 | 0.333696 | -0.402938 | -0.749421 | -0.071122 | -0.360803 | -0.112837 | -0.271032 | -0.069193 | -0.108754 | -0.876435 | -1.396198 |
| 3 | 0.271798 | -0.058301 | 0.907423 | 0.631240 | -0.570667 | 1.131669 | -0.28229 | -0.979478 | 0.117731 | -0.142823 | 1.475193 | -0.1294 | 1.073249 | 0.146230 | 0.891472 | 0.824828 | 0.333696 | 0.551855 | 1.207604 | 1.925761 | -0.360803 | -0.112837 | -0.271032 | -0.069193 | -0.108754 | -0.137395 | -1.396198 |
| 4 | -0.422299 | 0.061502 | -0.823324 | -1.687625 | -0.570667 | 0.295527 | -0.28229 | -0.528697 | -0.309406 | -0.293164 | 0.374306 | -0.1294 | 0.072801 | 0.146230 | -0.319271 | -0.650694 | -0.998480 | -0.835653 | -0.749421 | -0.685547 | -0.360803 | -0.112837 | -0.271032 | -0.069193 | -0.108754 | -0.876435 | 0.111371 |
## Standardize numeric column values of test data,prepare a dataframe and display first 5 records.
test_scaler = scaler.transform(test_imp)
test_scaler = pd.DataFrame(test_scaler,columns=test_imp.columns)
test_scaler.head()
| LotFrontage | LotArea | YearBuilt | YearRemodAdd | MasVnrArea | BsmtFinSF1 | BsmtFinSF2 | BsmtUnfSF | TotalBsmtSF | 1stFlrSF | 2ndFlrSF | LowQualFinSF | GrLivArea | BedroomAbvGr | TotRmsAbvGrd | GarageYrBlt | GarageCars | GarageArea | WoodDeckSF | OpenPorchSF | EnclosedPorch | 3SsnPorch | ScreenPorch | PoolArea | MiscVal | MoSold | YrSold | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 0.503164 | 0.131760 | -0.300834 | -1.156218 | -0.570667 | 0.058444 | 0.595815 | -0.670693 | -0.377310 | -0.674026 | -0.797828 | -0.1294 | -1.155949 | -1.03986 | -0.924643 | -0.691681 | -0.998480 | 1.243258 | 0.333518 | -0.685547 | -0.360803 | -0.112837 | 1.843566 | -0.069193 | -0.108754 | -0.137395 | 1.618941 |
| 1 | 0.549437 | 0.400303 | -0.398801 | -1.301147 | 0.028864 | 1.021594 | -0.282290 | -0.364162 | 0.601820 | 0.410930 | -0.797828 | -0.1294 | -0.352250 | 0.14623 | -0.319271 | -0.814641 | -0.998480 | -0.722771 | 2.290543 | -0.132564 | -0.360803 | -0.112837 | -0.271032 | -0.069193 | 40.227307 | -0.137395 | 1.618941 |
| 2 | 0.225525 | 0.355935 | 0.874768 | 0.631240 | -0.570667 | 0.742175 | -0.282290 | -0.970462 | -0.276549 | -0.593844 | 0.813282 | -0.1294 | 0.204585 | 0.14623 | -0.319271 | 0.783841 | 0.333696 | 0.076810 | 0.890458 | -0.163286 | -0.360803 | -0.112837 | -0.271032 | -0.069193 | -0.108754 | -1.245955 | 1.618941 |
| 3 | 0.410618 | -0.035153 | 0.907423 | 0.631240 | -0.459643 | 0.342097 | -0.282290 | -0.548982 | -0.280930 | -0.598856 | 0.760421 | -0.1294 | 0.158182 | 0.14623 | 0.286100 | 0.824828 | 0.333696 | 0.020369 | 2.035278 | -0.132564 | -0.360803 | -0.112837 | -0.271032 | -0.069193 | -0.108754 | -0.137395 | 1.618941 |
| 4 | -1.208944 | -0.540054 | 0.711490 | 0.341382 | -0.570667 | -0.375502 | -0.282290 | 1.012973 | 0.494488 | 0.288152 | -0.797828 | -0.1294 | -0.443200 | -1.03986 | -0.924643 | 0.578907 | 0.333696 | 0.189692 | -0.749421 | 0.574025 | -0.360803 | -0.112837 | 2.266486 | -0.069193 | -0.108754 | -1.984995 | 1.618941 |
## Combine numeric and category columns of train data and display dimesions of result dataframe.
train_result = ""
train_result = pd.concat([X_train_scaler, X_train_imp_cat], axis=1)
train_result.shape
(1022, 73)
## Check first 5 records of train data.
train_result.head()
| LotFrontage | LotArea | YearBuilt | YearRemodAdd | MasVnrArea | BsmtFinSF1 | BsmtFinSF2 | BsmtUnfSF | TotalBsmtSF | 1stFlrSF | 2ndFlrSF | LowQualFinSF | GrLivArea | BedroomAbvGr | TotRmsAbvGrd | GarageYrBlt | GarageCars | GarageArea | WoodDeckSF | OpenPorchSF | EnclosedPorch | 3SsnPorch | ScreenPorch | PoolArea | MiscVal | MoSold | YrSold | MSSubClass | MSZoning | Street | Alley | LotShape | LandContour | Utilities | LotConfig | LandSlope | Neighborhood | Condition1 | Condition2 | BldgType | HouseStyle | OverallQual | OverallCond | RoofStyle | RoofMatl | Exterior1st | Exterior2nd | MasVnrType | ExterQual | ExterCond | Foundation | BsmtQual | BsmtCond | BsmtExposure | BsmtFinType1 | BsmtFinType2 | Heating | HeatingQC | CentralAir | Electrical | KitchenQual | Functional | FireplaceQu | GarageType | GarageFinish | GarageQual | GarageCond | PavedDrive | PoolQC | Fence | MiscFeature | SaleType | SaleCondition | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | -0.561119 | -0.150083 | 0.842112 | 0.534621 | -0.570667 | -0.301414 | -0.28229 | 0.009986 | -0.403595 | -0.628924 | -0.797828 | -0.1294 | -1.122539 | -1.03986 | -0.924643 | 0.824828 | 0.333696 | 0.518931 | -0.749421 | -0.685547 | -0.360803 | -0.112837 | -0.271032 | -0.069193 | -0.108754 | 0.232125 | -1.396198 | 85 | RL | Pave | NAA | IR1 | Lvl | AllPub | CulDSac | Gtl | CollgCr | Norm | Norm | 1Fam | SFoyer | 5 | 5 | Gable | CompShg | VinylSd | VinylSd | None | Gd | TA | PConc | Gd | TA | Av | GLQ | Unf | GasA | Ex | Y | SBrkr | TA | Typ | NF | Detchd | Unf | TA | TA | Y | NP | NF | NE | WD | Normal |
| 1 | -0.653665 | -0.505027 | -0.986602 | -0.914670 | -0.570667 | -0.932224 | -0.28229 | 0.361595 | -0.714639 | 0.353300 | -0.797828 | -0.1294 | -0.394941 | 0.14623 | -0.319271 | 0.046080 | -2.330655 | -2.190237 | 1.284957 | -0.685547 | -0.360803 | -0.112837 | -0.271032 | -0.069193 | 1.343344 | -0.506915 | 1.618941 | 30 | RL | Pave | NAA | IR1 | Lvl | AllPub | Inside | Gtl | BrkSide | Norm | Norm | 1Fam | 1Story | 3 | 2 | Gable | CompShg | Wd Sdng | Plywood | None | TA | Po | CBlock | TA | TA | No | Unf | Unf | GasA | Ex | Y | SBrkr | Fa | Mod | NF | NG | NG | NG | NG | Y | NP | GdWo | Shed | WD | Normal |
| 2 | 0.040432 | -0.183080 | -0.104901 | -0.866360 | -0.570667 | 0.850133 | -0.28229 | -1.121474 | -0.311597 | -0.633935 | -0.797828 | -0.1294 | -1.126251 | 0.14623 | -0.924643 | -0.158854 | -0.998480 | -0.609889 | -0.749421 | -0.685547 | -0.360803 | -0.112837 | -0.271032 | -0.069193 | -0.108754 | -0.506915 | 1.618941 | 20 | RL | Pave | NAA | Reg | Lvl | AllPub | FR2 | Gtl | Sawyer | Feedr | Norm | 1Fam | 1Story | 5 | 5 | Gable | CompShg | HdBoard | HdBoard | None | TA | TA | CBlock | TA | TA | No | ALQ | Unf | GasA | TA | Y | SBrkr | TA | Typ | Fa | Detchd | Unf | TA | TA | Y | NP | MnPrv | NE | WD | Normal |
| 3 | 0.688257 | -0.169171 | -0.170212 | -0.962979 | -0.009994 | 0.428887 | -0.28229 | -0.276260 | 0.073922 | 0.398402 | -0.797828 | -0.1294 | -0.361531 | 0.14623 | -0.319271 | -0.527734 | 0.333696 | -0.120733 | -0.749421 | 1.434220 | -0.360803 | -0.112837 | -0.271032 | -0.069193 | -0.108754 | 2.079725 | -1.396198 | 20 | RL | Pave | NAA | Reg | Lvl | AllPub | Inside | Gtl | NAmes | Norm | Norm | 1Fam | 1Story | 6 | 5 | Gable | CompShg | Wd Sdng | Wd Sdng | BrkFace | TA | TA | CBlock | TA | TA | No | Rec | Unf | GasA | Ex | Y | SBrkr | TA | Typ | TA | Attchd | RFn | TA | TA | Y | NP | GdWo | NE | WD | Abnorml |
| 4 | -0.237207 | -0.361669 | 1.201324 | 1.066028 | 0.028864 | 0.473340 | -0.28229 | -0.053123 | 0.336776 | 0.107744 | -0.797828 | -0.1294 | -0.576840 | -1.03986 | -0.319271 | 1.193708 | 0.333696 | 0.763509 | 0.062783 | 0.143927 | -0.360803 | -0.112837 | -0.271032 | -0.069193 | -0.108754 | 0.971165 | -0.642413 | 20 | RL | Pave | NAA | Reg | Lvl | AllPub | Inside | Gtl | CollgCr | Norm | Norm | 1Fam | 1Story | 7 | 5 | Gable | CompShg | VinylSd | VinylSd | BrkFace | Gd | TA | PConc | Gd | TA | No | GLQ | Unf | GasA | Ex | Y | SBrkr | Gd | Typ | NF | Attchd | RFn | TA | TA | Y | NP | NF | NE | New | Partial |
## Check NA values for train data.
train_result.isna().sum()
LotFrontage 0 LotArea 0 YearBuilt 0 YearRemodAdd 0 MasVnrArea 0 BsmtFinSF1 0 BsmtFinSF2 0 BsmtUnfSF 0 TotalBsmtSF 0 1stFlrSF 0 2ndFlrSF 0 LowQualFinSF 0 GrLivArea 0 BedroomAbvGr 0 TotRmsAbvGrd 0 GarageYrBlt 0 GarageCars 0 GarageArea 0 WoodDeckSF 0 OpenPorchSF 0 EnclosedPorch 0 3SsnPorch 0 ScreenPorch 0 PoolArea 0 MiscVal 0 MoSold 0 YrSold 0 MSSubClass 0 MSZoning 0 Street 0 Alley 0 LotShape 0 LandContour 0 Utilities 0 LotConfig 0 LandSlope 0 Neighborhood 0 Condition1 0 Condition2 0 BldgType 0 HouseStyle 0 OverallQual 0 OverallCond 0 RoofStyle 0 RoofMatl 0 Exterior1st 0 Exterior2nd 0 MasVnrType 0 ExterQual 0 ExterCond 0 Foundation 0 BsmtQual 0 BsmtCond 0 BsmtExposure 0 BsmtFinType1 0 BsmtFinType2 0 Heating 0 HeatingQC 0 CentralAir 0 Electrical 0 KitchenQual 0 Functional 0 FireplaceQu 0 GarageType 0 GarageFinish 0 GarageQual 0 GarageCond 0 PavedDrive 0 PoolQC 0 Fence 0 MiscFeature 0 SaleType 0 SaleCondition 0 dtype: int64
## Prepare a dataframe with train data.
dataframe1 = pd.DataFrame(train_result)
## Copy dataframe data into a CSV file.
dataframe1.to_csv('TrainDataPreprocess.csv',index=False)
## Combine numeric and category columns of validation data.
test_result = ""
test_result = pd.concat([X_test_scaler, X_test_imp_cat], axis=1)
## Check dimesions of validation data.
test_result.shape
(438, 73)
## Get first 5 records of validation data.
test_result.head()
| LotFrontage | LotArea | YearBuilt | YearRemodAdd | MasVnrArea | BsmtFinSF1 | BsmtFinSF2 | BsmtUnfSF | TotalBsmtSF | 1stFlrSF | 2ndFlrSF | LowQualFinSF | GrLivArea | BedroomAbvGr | TotRmsAbvGrd | GarageYrBlt | GarageCars | GarageArea | WoodDeckSF | OpenPorchSF | EnclosedPorch | 3SsnPorch | ScreenPorch | PoolArea | MiscVal | MoSold | YrSold | MSSubClass | MSZoning | Street | Alley | LotShape | LandContour | Utilities | LotConfig | LandSlope | Neighborhood | Condition1 | Condition2 | BldgType | HouseStyle | OverallQual | OverallCond | RoofStyle | RoofMatl | Exterior1st | Exterior2nd | MasVnrType | ExterQual | ExterCond | Foundation | BsmtQual | BsmtCond | BsmtExposure | BsmtFinType1 | BsmtFinType2 | Heating | HeatingQC | CentralAir | Electrical | KitchenQual | Functional | FireplaceQu | GarageType | GarageFinish | GarageQual | GarageCond | PavedDrive | PoolQC | Fence | MiscFeature | SaleType | SaleCondition | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | -0.052114 | -0.083176 | 1.005390 | 0.776169 | 0.428552 | -0.932224 | -0.28229 | 0.713204 | -0.372929 | -0.704094 | 1.847518 | -0.1294 | 0.958169 | 0.146230 | 0.891472 | 0.947788 | 0.333696 | -0.148954 | 0.364459 | 0.051763 | -0.360803 | -0.112837 | -0.271032 | -0.069193 | -0.108754 | -0.506915 | 1.618941 | 60 | RL | Pave | NAA | IR1 | Lvl | AllPub | CulDSac | Gtl | Gilbert | Norm | Norm | 1Fam | 2Story | 7 | 5 | Gable | CompShg | VinylSd | VinylSd | BrkFace | Gd | TA | PConc | Gd | TA | No | Unf | Unf | GasA | Ex | Y | SBrkr | Gd | Typ | Gd | BuiltIn | Fin | TA | TA | Y | NP | NF | NE | WD | Normal |
| 1 | -0.422299 | -0.073531 | -2.292826 | -1.687625 | -0.570667 | -0.932224 | -0.28229 | 1.188778 | 0.089255 | -0.175397 | 0.762719 | -0.1294 | 0.473722 | 1.332321 | 0.891472 | -2.372137 | 1.665871 | 1.473725 | -0.749421 | -0.685547 | 1.088712 | -0.112837 | -0.271032 | -0.069193 | -0.108754 | -0.506915 | -1.396198 | 70 | RM | Pave | Grvl | Reg | Lvl | AllPub | Inside | Gtl | OldTown | Norm | Norm | 1Fam | 2Story | 4 | 2 | Gable | CompShg | AsbShng | Stucco | None | TA | TA | BrkTil | TA | Fa | No | Unf | Unf | GasW | Fa | N | SBrkr | TA | Min2 | NF | 2Types | Unf | Fa | Fa | N | NP | NF | NE | WD | Normal |
| 2 | -1.717949 | -0.706764 | 1.103357 | 0.921098 | 0.078825 | -0.932224 | -0.28229 | 0.920563 | -0.171408 | -0.473572 | 0.884529 | -0.1294 | 0.351219 | 0.146230 | 0.286100 | 1.070748 | 0.333696 | -0.402938 | -0.749421 | -0.071122 | -0.360803 | -0.112837 | -0.271032 | -0.069193 | -0.108754 | -0.876435 | -1.396198 | 160 | RM | Pave | NAA | Reg | Lvl | AllPub | Inside | Gtl | Edwards | Norm | Norm | TwnhsE | 2Story | 7 | 5 | Gable | CompShg | VinylSd | VinylSd | Stone | Gd | TA | PConc | Gd | TA | No | Unf | Unf | GasA | Ex | Y | SBrkr | Gd | Maj1 | NF | Detchd | Unf | TA | TA | Y | NP | NF | NE | WD | Normal |
| 3 | 0.271798 | -0.058301 | 0.907423 | 0.631240 | -0.570667 | 1.131669 | -0.28229 | -0.979478 | 0.117731 | -0.142823 | 1.475193 | -0.1294 | 1.073249 | 0.146230 | 0.891472 | 0.824828 | 0.333696 | 0.551855 | 1.207604 | 1.925761 | -0.360803 | -0.112837 | -0.271032 | -0.069193 | -0.108754 | -0.137395 | -1.396198 | 60 | RL | Pave | NAA | Reg | Lvl | AllPub | Corner | Gtl | CollgCr | Norm | Norm | 1Fam | 2Story | 7 | 6 | Gable | CompShg | VinylSd | VinylSd | None | TA | TA | PConc | Gd | TA | Av | GLQ | Unf | GasA | Ex | Y | SBrkr | Gd | Typ | TA | Detchd | RFn | TA | TA | Y | NP | NF | NE | WD | Normal |
| 4 | -0.422299 | 0.061502 | -0.823324 | -1.687625 | -0.570667 | 0.295527 | -0.28229 | -0.528697 | -0.309406 | -0.293164 | 0.374306 | -0.1294 | 0.072801 | 0.146230 | -0.319271 | -0.650694 | -0.998480 | -0.835653 | -0.749421 | -0.685547 | -0.360803 | -0.112837 | -0.271032 | -0.069193 | -0.108754 | -0.876435 | 0.111371 | 50 | RL | Pave | Grvl | Reg | Bnk | AllPub | Inside | Gtl | NAmes | Artery | Norm | 1Fam | 1.5Fin | 5 | 6 | Gable | CompShg | MetalSd | MetalSd | None | TA | TA | CBlock | TA | TA | No | BLQ | Unf | GasA | TA | Y | FuseA | TA | Typ | TA | Attchd | Unf | TA | TA | Y | NP | NF | NE | WD | Normal |
## Check NA values for validation data after combining numeric and category columns.
test_result.isna().sum()
LotFrontage 0 LotArea 0 YearBuilt 0 YearRemodAdd 0 MasVnrArea 0 BsmtFinSF1 0 BsmtFinSF2 0 BsmtUnfSF 0 TotalBsmtSF 0 1stFlrSF 0 2ndFlrSF 0 LowQualFinSF 0 GrLivArea 0 BedroomAbvGr 0 TotRmsAbvGrd 0 GarageYrBlt 0 GarageCars 0 GarageArea 0 WoodDeckSF 0 OpenPorchSF 0 EnclosedPorch 0 3SsnPorch 0 ScreenPorch 0 PoolArea 0 MiscVal 0 MoSold 0 YrSold 0 MSSubClass 0 MSZoning 0 Street 0 Alley 0 LotShape 0 LandContour 0 Utilities 0 LotConfig 0 LandSlope 0 Neighborhood 0 Condition1 0 Condition2 0 BldgType 0 HouseStyle 0 OverallQual 0 OverallCond 0 RoofStyle 0 RoofMatl 0 Exterior1st 0 Exterior2nd 0 MasVnrType 0 ExterQual 0 ExterCond 0 Foundation 0 BsmtQual 0 BsmtCond 0 BsmtExposure 0 BsmtFinType1 0 BsmtFinType2 0 Heating 0 HeatingQC 0 CentralAir 0 Electrical 0 KitchenQual 0 Functional 0 FireplaceQu 0 GarageType 0 GarageFinish 0 GarageQual 0 GarageCond 0 PavedDrive 0 PoolQC 0 Fence 0 MiscFeature 0 SaleType 0 SaleCondition 0 dtype: int64
## Prepare dataframe with validation data.
dataframe2 = pd.DataFrame(test_result)
## Copy dataframe data into a CSV file.
dataframe2.to_csv('ValidationDataPreprocess.csv',index=False)
## Prepare a dataframe with target column data of train.
dataframe3 = pd.DataFrame(y_train)
## Copy dataframe data into a CSV file.
dataframe3.to_csv('TrainTarget.csv',index=False)
## Prepare a dataframe with target column data of validation.
dataframe4 = pd.DataFrame(y_test)
## Copy dataframe data into a CSV file.
dataframe4.to_csv('ValidationTarget.csv',index=False)
################################################## Dummification###############################################################
## Display category column levels of train data.
for i in X_train_imp_cat:
print(i , X_train_imp_cat[i].nunique())
MSSubClass 15 MSZoning 5 Street 2 Alley 3 LotShape 4 LandContour 4 Utilities 2 LotConfig 5 LandSlope 3 Neighborhood 25 Condition1 9 Condition2 8 BldgType 5 HouseStyle 8 OverallQual 10 OverallCond 9 RoofStyle 6 RoofMatl 7 Exterior1st 13 Exterior2nd 16 MasVnrType 5 ExterQual 4 ExterCond 5 Foundation 6 BsmtQual 5 BsmtCond 5 BsmtExposure 5 BsmtFinType1 7 BsmtFinType2 7 Heating 5 HeatingQC 5 CentralAir 2 Electrical 6 KitchenQual 4 Functional 6 FireplaceQu 6 GarageType 7 GarageFinish 4 GarageQual 6 GarageCond 6 PavedDrive 3 PoolQC 4 Fence 5 MiscFeature 4 SaleType 9 SaleCondition 6
## Display dimensions of train data.
X_train_imp_cat.shape
(1022, 46)
## Get dummies for category columns of train data,display dimesnionns and first 5 records.
catcols_train_dummy = pd.get_dummies(columns = X_train_imp_cat.columns, data = X_train_imp_cat, drop_first= True)
print(catcols_train_dummy.shape)
catcols_train_dummy.head()
(1022, 250)
| MSSubClass_160 | MSSubClass_180 | MSSubClass_190 | MSSubClass_20 | MSSubClass_30 | MSSubClass_40 | MSSubClass_45 | MSSubClass_50 | MSSubClass_60 | MSSubClass_70 | MSSubClass_75 | MSSubClass_80 | MSSubClass_85 | MSSubClass_90 | MSZoning_FV | MSZoning_RH | MSZoning_RL | MSZoning_RM | Street_Pave | Alley_NAA | Alley_Pave | LotShape_IR2 | LotShape_IR3 | LotShape_Reg | LandContour_HLS | LandContour_Low | LandContour_Lvl | Utilities_NoSeWa | LotConfig_CulDSac | LotConfig_FR2 | LotConfig_FR3 | LotConfig_Inside | LandSlope_Mod | LandSlope_Sev | Neighborhood_Blueste | Neighborhood_BrDale | Neighborhood_BrkSide | Neighborhood_ClearCr | Neighborhood_CollgCr | Neighborhood_Crawfor | Neighborhood_Edwards | Neighborhood_Gilbert | Neighborhood_IDOTRR | Neighborhood_MeadowV | Neighborhood_Mitchel | Neighborhood_NAmes | Neighborhood_NPkVill | Neighborhood_NWAmes | Neighborhood_NoRidge | Neighborhood_NridgHt | Neighborhood_OldTown | Neighborhood_SWISU | Neighborhood_Sawyer | Neighborhood_SawyerW | Neighborhood_Somerst | Neighborhood_StoneBr | Neighborhood_Timber | Neighborhood_Veenker | Condition1_Feedr | Condition1_Norm | Condition1_PosA | Condition1_PosN | Condition1_RRAe | Condition1_RRAn | Condition1_RRNe | Condition1_RRNn | Condition2_Feedr | Condition2_Norm | Condition2_PosA | Condition2_PosN | Condition2_RRAe | Condition2_RRAn | Condition2_RRNn | BldgType_2fmCon | BldgType_Duplex | BldgType_Twnhs | BldgType_TwnhsE | HouseStyle_1.5Unf | HouseStyle_1Story | HouseStyle_2.5Fin | HouseStyle_2.5Unf | HouseStyle_2Story | HouseStyle_SFoyer | HouseStyle_SLvl | OverallQual_10 | OverallQual_2 | OverallQual_3 | OverallQual_4 | OverallQual_5 | OverallQual_6 | OverallQual_7 | OverallQual_8 | OverallQual_9 | OverallCond_2 | OverallCond_3 | OverallCond_4 | OverallCond_5 | OverallCond_6 | OverallCond_7 | OverallCond_8 | ... | Foundation_CBlock | Foundation_PConc | Foundation_Slab | Foundation_Stone | Foundation_Wood | BsmtQual_Fa | BsmtQual_Gd | BsmtQual_NB | BsmtQual_TA | BsmtCond_Gd | BsmtCond_NB | BsmtCond_Po | BsmtCond_TA | BsmtExposure_Gd | BsmtExposure_Mn | BsmtExposure_NB | BsmtExposure_No | BsmtFinType1_BLQ | BsmtFinType1_GLQ | BsmtFinType1_LwQ | BsmtFinType1_NB | BsmtFinType1_Rec | BsmtFinType1_Unf | BsmtFinType2_BLQ | BsmtFinType2_GLQ | BsmtFinType2_LwQ | BsmtFinType2_NB | BsmtFinType2_Rec | BsmtFinType2_Unf | Heating_GasW | Heating_Grav | Heating_OthW | Heating_Wall | HeatingQC_Fa | HeatingQC_Gd | HeatingQC_Po | HeatingQC_TA | CentralAir_Y | Electrical_FuseF | Electrical_FuseP | Electrical_Mix | Electrical_SBrkr | Electrical_nan | KitchenQual_Fa | KitchenQual_Gd | KitchenQual_TA | Functional_Maj2 | Functional_Min1 | Functional_Min2 | Functional_Mod | Functional_Typ | FireplaceQu_Fa | FireplaceQu_Gd | FireplaceQu_NF | FireplaceQu_Po | FireplaceQu_TA | GarageType_Attchd | GarageType_Basment | GarageType_BuiltIn | GarageType_CarPort | GarageType_Detchd | GarageType_NG | GarageFinish_NG | GarageFinish_RFn | GarageFinish_Unf | GarageQual_Fa | GarageQual_Gd | GarageQual_NG | GarageQual_Po | GarageQual_TA | GarageCond_Fa | GarageCond_Gd | GarageCond_NG | GarageCond_Po | GarageCond_TA | PavedDrive_P | PavedDrive_Y | PoolQC_Fa | PoolQC_Gd | PoolQC_NP | Fence_GdWo | Fence_MnPrv | Fence_MnWw | Fence_NF | MiscFeature_NE | MiscFeature_Shed | MiscFeature_TenC | SaleType_CWD | SaleType_Con | SaleType_ConLD | SaleType_ConLI | SaleType_ConLw | SaleType_New | SaleType_Oth | SaleType_WD | SaleCondition_AdjLand | SaleCondition_Alloca | SaleCondition_Family | SaleCondition_Normal | SaleCondition_Partial | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | ... | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 |
| 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 |
| 2 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | ... | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 0 | 1 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 |
| 3 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | ... | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 0 | 1 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 |
| 4 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | ... | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 1 |
5 rows × 250 columns
## Display category column levels of validation data.
for i in X_test_imp_cat:
print(i , X_test_imp_cat[i].nunique())
MSSubClass 15 MSZoning 4 Street 2 Alley 3 LotShape 4 LandContour 4 Utilities 1 LotConfig 5 LandSlope 3 Neighborhood 25 Condition1 8 Condition2 2 BldgType 5 HouseStyle 8 OverallQual 8 OverallCond 8 RoofStyle 5 RoofMatl 5 Exterior1st 14 Exterior2nd 14 MasVnrType 5 ExterQual 4 ExterCond 4 Foundation 6 BsmtQual 5 BsmtCond 4 BsmtExposure 5 BsmtFinType1 7 BsmtFinType2 7 Heating 5 HeatingQC 4 CentralAir 2 Electrical 4 KitchenQual 4 Functional 7 FireplaceQu 6 GarageType 7 GarageFinish 4 GarageQual 6 GarageCond 6 PavedDrive 3 PoolQC 3 Fence 5 MiscFeature 4 SaleType 6 SaleCondition 6
## Display dimesnions of category columns of validation data.
X_test_imp_cat.shape
(438, 46)
## Get dummies for category columns of train data,display dimesnionns and first 5 records.
catcols_test_dummy = pd.get_dummies(columns = X_test_imp_cat.columns, data = X_test_imp_cat, drop_first= True)
print(catcols_test_dummy.shape)
catcols_test_dummy.head()
(438, 226)
| MSSubClass_160 | MSSubClass_180 | MSSubClass_190 | MSSubClass_20 | MSSubClass_30 | MSSubClass_40 | MSSubClass_45 | MSSubClass_50 | MSSubClass_60 | MSSubClass_70 | MSSubClass_75 | MSSubClass_80 | MSSubClass_85 | MSSubClass_90 | MSZoning_RH | MSZoning_RL | MSZoning_RM | Street_Pave | Alley_NAA | Alley_Pave | LotShape_IR2 | LotShape_IR3 | LotShape_Reg | LandContour_HLS | LandContour_Low | LandContour_Lvl | LotConfig_CulDSac | LotConfig_FR2 | LotConfig_FR3 | LotConfig_Inside | LandSlope_Mod | LandSlope_Sev | Neighborhood_Blueste | Neighborhood_BrDale | Neighborhood_BrkSide | Neighborhood_ClearCr | Neighborhood_CollgCr | Neighborhood_Crawfor | Neighborhood_Edwards | Neighborhood_Gilbert | Neighborhood_IDOTRR | Neighborhood_MeadowV | Neighborhood_Mitchel | Neighborhood_NAmes | Neighborhood_NPkVill | Neighborhood_NWAmes | Neighborhood_NoRidge | Neighborhood_NridgHt | Neighborhood_OldTown | Neighborhood_SWISU | Neighborhood_Sawyer | Neighborhood_SawyerW | Neighborhood_Somerst | Neighborhood_StoneBr | Neighborhood_Timber | Neighborhood_Veenker | Condition1_Feedr | Condition1_Norm | Condition1_PosA | Condition1_PosN | Condition1_RRAe | Condition1_RRAn | Condition1_RRNn | Condition2_Norm | BldgType_2fmCon | BldgType_Duplex | BldgType_Twnhs | BldgType_TwnhsE | HouseStyle_1.5Unf | HouseStyle_1Story | HouseStyle_2.5Fin | HouseStyle_2.5Unf | HouseStyle_2Story | HouseStyle_SFoyer | HouseStyle_SLvl | OverallQual_3 | OverallQual_4 | OverallQual_5 | OverallQual_6 | OverallQual_7 | OverallQual_8 | OverallQual_9 | OverallCond_3 | OverallCond_4 | OverallCond_5 | OverallCond_6 | OverallCond_7 | OverallCond_8 | OverallCond_9 | RoofStyle_Gable | RoofStyle_Hip | RoofStyle_Mansard | RoofStyle_Shed | RoofMatl_Metal | RoofMatl_Tar&Grv | RoofMatl_WdShake | RoofMatl_WdShngl | Exterior1st_AsphShn | Exterior1st_BrkComm | Exterior1st_BrkFace | ... | MasVnrType_nan | ExterQual_Fa | ExterQual_Gd | ExterQual_TA | ExterCond_Fa | ExterCond_Gd | ExterCond_TA | Foundation_CBlock | Foundation_PConc | Foundation_Slab | Foundation_Stone | Foundation_Wood | BsmtQual_Fa | BsmtQual_Gd | BsmtQual_NB | BsmtQual_TA | BsmtCond_Gd | BsmtCond_NB | BsmtCond_TA | BsmtExposure_Gd | BsmtExposure_Mn | BsmtExposure_NB | BsmtExposure_No | BsmtFinType1_BLQ | BsmtFinType1_GLQ | BsmtFinType1_LwQ | BsmtFinType1_NB | BsmtFinType1_Rec | BsmtFinType1_Unf | BsmtFinType2_BLQ | BsmtFinType2_GLQ | BsmtFinType2_LwQ | BsmtFinType2_NB | BsmtFinType2_Rec | BsmtFinType2_Unf | Heating_GasA | Heating_GasW | Heating_Grav | Heating_Wall | HeatingQC_Fa | HeatingQC_Gd | HeatingQC_TA | CentralAir_Y | Electrical_FuseF | Electrical_FuseP | Electrical_SBrkr | KitchenQual_Fa | KitchenQual_Gd | KitchenQual_TA | Functional_Maj2 | Functional_Min1 | Functional_Min2 | Functional_Mod | Functional_Sev | Functional_Typ | FireplaceQu_Fa | FireplaceQu_Gd | FireplaceQu_NF | FireplaceQu_Po | FireplaceQu_TA | GarageType_Attchd | GarageType_Basment | GarageType_BuiltIn | GarageType_CarPort | GarageType_Detchd | GarageType_NG | GarageFinish_NG | GarageFinish_RFn | GarageFinish_Unf | GarageQual_Fa | GarageQual_Gd | GarageQual_NG | GarageQual_Po | GarageQual_TA | GarageCond_Fa | GarageCond_Gd | GarageCond_NG | GarageCond_Po | GarageCond_TA | PavedDrive_P | PavedDrive_Y | PoolQC_Gd | PoolQC_NP | Fence_GdWo | Fence_MnPrv | Fence_MnWw | Fence_NF | MiscFeature_NE | MiscFeature_Othr | MiscFeature_Shed | SaleType_ConLD | SaleType_ConLI | SaleType_ConLw | SaleType_New | SaleType_WD | SaleCondition_AdjLand | SaleCondition_Alloca | SaleCondition_Family | SaleCondition_Normal | SaleCondition_Partial | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 |
| 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 |
| 2 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 |
| 3 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 |
| 4 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 |
5 rows × 226 columns
#train_levels,test_levels = catcols_train_dummy.align(catcols_test_dummy, join='outer', axis=1, fill_value=0)
## Get missing columns in the validation data test.
missing_cols = set( catcols_train_dummy.columns ) - set( catcols_test_dummy.columns )
## Add a missing column in test set with default value equal to 0.
for c in missing_cols:
catcols_test_dummy[c] = 0
catcols_test_dummy = catcols_test_dummy[catcols_train_dummy.columns]
## Display dimensions of category columns of train data.
catcols_train_dummy.shape
(1022, 250)
## Display dimensions of category columns of validation data.
catcols_test_dummy.shape
(438, 250)
## Combine numeric and category columns of train data.
train_data_final = pd.concat([X_train_scaler, catcols_train_dummy], axis=1)
## Check dimesnions of train data.
train_data_final.shape
(1022, 277)
## Check dimesnions of target varible of train data.
y_train.shape
(1022,)
## Check first 5 records of train ata.
train_data_final.head()
| LotFrontage | LotArea | YearBuilt | YearRemodAdd | MasVnrArea | BsmtFinSF1 | BsmtFinSF2 | BsmtUnfSF | TotalBsmtSF | 1stFlrSF | 2ndFlrSF | LowQualFinSF | GrLivArea | BedroomAbvGr | TotRmsAbvGrd | GarageYrBlt | GarageCars | GarageArea | WoodDeckSF | OpenPorchSF | EnclosedPorch | 3SsnPorch | ScreenPorch | PoolArea | MiscVal | MoSold | YrSold | MSSubClass_160 | MSSubClass_180 | MSSubClass_190 | MSSubClass_20 | MSSubClass_30 | MSSubClass_40 | MSSubClass_45 | MSSubClass_50 | MSSubClass_60 | MSSubClass_70 | MSSubClass_75 | MSSubClass_80 | MSSubClass_85 | MSSubClass_90 | MSZoning_FV | MSZoning_RH | MSZoning_RL | MSZoning_RM | Street_Pave | Alley_NAA | Alley_Pave | LotShape_IR2 | LotShape_IR3 | LotShape_Reg | LandContour_HLS | LandContour_Low | LandContour_Lvl | Utilities_NoSeWa | LotConfig_CulDSac | LotConfig_FR2 | LotConfig_FR3 | LotConfig_Inside | LandSlope_Mod | LandSlope_Sev | Neighborhood_Blueste | Neighborhood_BrDale | Neighborhood_BrkSide | Neighborhood_ClearCr | Neighborhood_CollgCr | Neighborhood_Crawfor | Neighborhood_Edwards | Neighborhood_Gilbert | Neighborhood_IDOTRR | Neighborhood_MeadowV | Neighborhood_Mitchel | Neighborhood_NAmes | Neighborhood_NPkVill | Neighborhood_NWAmes | Neighborhood_NoRidge | Neighborhood_NridgHt | Neighborhood_OldTown | Neighborhood_SWISU | Neighborhood_Sawyer | Neighborhood_SawyerW | Neighborhood_Somerst | Neighborhood_StoneBr | Neighborhood_Timber | Neighborhood_Veenker | Condition1_Feedr | Condition1_Norm | Condition1_PosA | Condition1_PosN | Condition1_RRAe | Condition1_RRAn | Condition1_RRNe | Condition1_RRNn | Condition2_Feedr | Condition2_Norm | Condition2_PosA | Condition2_PosN | Condition2_RRAe | Condition2_RRAn | Condition2_RRNn | ... | Foundation_CBlock | Foundation_PConc | Foundation_Slab | Foundation_Stone | Foundation_Wood | BsmtQual_Fa | BsmtQual_Gd | BsmtQual_NB | BsmtQual_TA | BsmtCond_Gd | BsmtCond_NB | BsmtCond_Po | BsmtCond_TA | BsmtExposure_Gd | BsmtExposure_Mn | BsmtExposure_NB | BsmtExposure_No | BsmtFinType1_BLQ | BsmtFinType1_GLQ | BsmtFinType1_LwQ | BsmtFinType1_NB | BsmtFinType1_Rec | BsmtFinType1_Unf | BsmtFinType2_BLQ | BsmtFinType2_GLQ | BsmtFinType2_LwQ | BsmtFinType2_NB | BsmtFinType2_Rec | BsmtFinType2_Unf | Heating_GasW | Heating_Grav | Heating_OthW | Heating_Wall | HeatingQC_Fa | HeatingQC_Gd | HeatingQC_Po | HeatingQC_TA | CentralAir_Y | Electrical_FuseF | Electrical_FuseP | Electrical_Mix | Electrical_SBrkr | Electrical_nan | KitchenQual_Fa | KitchenQual_Gd | KitchenQual_TA | Functional_Maj2 | Functional_Min1 | Functional_Min2 | Functional_Mod | Functional_Typ | FireplaceQu_Fa | FireplaceQu_Gd | FireplaceQu_NF | FireplaceQu_Po | FireplaceQu_TA | GarageType_Attchd | GarageType_Basment | GarageType_BuiltIn | GarageType_CarPort | GarageType_Detchd | GarageType_NG | GarageFinish_NG | GarageFinish_RFn | GarageFinish_Unf | GarageQual_Fa | GarageQual_Gd | GarageQual_NG | GarageQual_Po | GarageQual_TA | GarageCond_Fa | GarageCond_Gd | GarageCond_NG | GarageCond_Po | GarageCond_TA | PavedDrive_P | PavedDrive_Y | PoolQC_Fa | PoolQC_Gd | PoolQC_NP | Fence_GdWo | Fence_MnPrv | Fence_MnWw | Fence_NF | MiscFeature_NE | MiscFeature_Shed | MiscFeature_TenC | SaleType_CWD | SaleType_Con | SaleType_ConLD | SaleType_ConLI | SaleType_ConLw | SaleType_New | SaleType_Oth | SaleType_WD | SaleCondition_AdjLand | SaleCondition_Alloca | SaleCondition_Family | SaleCondition_Normal | SaleCondition_Partial | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | -0.561119 | -0.150083 | 0.842112 | 0.534621 | -0.570667 | -0.301414 | -0.28229 | 0.009986 | -0.403595 | -0.628924 | -0.797828 | -0.1294 | -1.122539 | -1.03986 | -0.924643 | 0.824828 | 0.333696 | 0.518931 | -0.749421 | -0.685547 | -0.360803 | -0.112837 | -0.271032 | -0.069193 | -0.108754 | 0.232125 | -1.396198 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 |
| 1 | -0.653665 | -0.505027 | -0.986602 | -0.914670 | -0.570667 | -0.932224 | -0.28229 | 0.361595 | -0.714639 | 0.353300 | -0.797828 | -0.1294 | -0.394941 | 0.14623 | -0.319271 | 0.046080 | -2.330655 | -2.190237 | 1.284957 | -0.685547 | -0.360803 | -0.112837 | -0.271032 | -0.069193 | 1.343344 | -0.506915 | 1.618941 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | ... | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 |
| 2 | 0.040432 | -0.183080 | -0.104901 | -0.866360 | -0.570667 | 0.850133 | -0.28229 | -1.121474 | -0.311597 | -0.633935 | -0.797828 | -0.1294 | -1.126251 | 0.14623 | -0.924643 | -0.158854 | -0.998480 | -0.609889 | -0.749421 | -0.685547 | -0.360803 | -0.112837 | -0.271032 | -0.069193 | -0.108754 | -0.506915 | 1.618941 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | ... | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 0 | 1 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 |
| 3 | 0.688257 | -0.169171 | -0.170212 | -0.962979 | -0.009994 | 0.428887 | -0.28229 | -0.276260 | 0.073922 | 0.398402 | -0.797828 | -0.1294 | -0.361531 | 0.14623 | -0.319271 | -0.527734 | 0.333696 | -0.120733 | -0.749421 | 1.434220 | -0.360803 | -0.112837 | -0.271032 | -0.069193 | -0.108754 | 2.079725 | -1.396198 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | ... | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 0 | 1 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 |
| 4 | -0.237207 | -0.361669 | 1.201324 | 1.066028 | 0.028864 | 0.473340 | -0.28229 | -0.053123 | 0.336776 | 0.107744 | -0.797828 | -0.1294 | -0.576840 | -1.03986 | -0.319271 | 1.193708 | 0.333696 | 0.763509 | 0.062783 | 0.143927 | -0.360803 | -0.112837 | -0.271032 | -0.069193 | -0.108754 | 0.971165 | -0.642413 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 1 |
5 rows × 277 columns
## Combine numeric and category columns of validation data.
test_data_final = pd.concat([X_test_scaler, catcols_test_dummy], axis=1)
## Display dimensions of validation data.
test_data_final.shape
(438, 277)
## Get first 5 records of validation data.
test_data_final.head()
| LotFrontage | LotArea | YearBuilt | YearRemodAdd | MasVnrArea | BsmtFinSF1 | BsmtFinSF2 | BsmtUnfSF | TotalBsmtSF | 1stFlrSF | 2ndFlrSF | LowQualFinSF | GrLivArea | BedroomAbvGr | TotRmsAbvGrd | GarageYrBlt | GarageCars | GarageArea | WoodDeckSF | OpenPorchSF | EnclosedPorch | 3SsnPorch | ScreenPorch | PoolArea | MiscVal | MoSold | YrSold | MSSubClass_160 | MSSubClass_180 | MSSubClass_190 | MSSubClass_20 | MSSubClass_30 | MSSubClass_40 | MSSubClass_45 | MSSubClass_50 | MSSubClass_60 | MSSubClass_70 | MSSubClass_75 | MSSubClass_80 | MSSubClass_85 | MSSubClass_90 | MSZoning_FV | MSZoning_RH | MSZoning_RL | MSZoning_RM | Street_Pave | Alley_NAA | Alley_Pave | LotShape_IR2 | LotShape_IR3 | LotShape_Reg | LandContour_HLS | LandContour_Low | LandContour_Lvl | Utilities_NoSeWa | LotConfig_CulDSac | LotConfig_FR2 | LotConfig_FR3 | LotConfig_Inside | LandSlope_Mod | LandSlope_Sev | Neighborhood_Blueste | Neighborhood_BrDale | Neighborhood_BrkSide | Neighborhood_ClearCr | Neighborhood_CollgCr | Neighborhood_Crawfor | Neighborhood_Edwards | Neighborhood_Gilbert | Neighborhood_IDOTRR | Neighborhood_MeadowV | Neighborhood_Mitchel | Neighborhood_NAmes | Neighborhood_NPkVill | Neighborhood_NWAmes | Neighborhood_NoRidge | Neighborhood_NridgHt | Neighborhood_OldTown | Neighborhood_SWISU | Neighborhood_Sawyer | Neighborhood_SawyerW | Neighborhood_Somerst | Neighborhood_StoneBr | Neighborhood_Timber | Neighborhood_Veenker | Condition1_Feedr | Condition1_Norm | Condition1_PosA | Condition1_PosN | Condition1_RRAe | Condition1_RRAn | Condition1_RRNe | Condition1_RRNn | Condition2_Feedr | Condition2_Norm | Condition2_PosA | Condition2_PosN | Condition2_RRAe | Condition2_RRAn | Condition2_RRNn | ... | Foundation_CBlock | Foundation_PConc | Foundation_Slab | Foundation_Stone | Foundation_Wood | BsmtQual_Fa | BsmtQual_Gd | BsmtQual_NB | BsmtQual_TA | BsmtCond_Gd | BsmtCond_NB | BsmtCond_Po | BsmtCond_TA | BsmtExposure_Gd | BsmtExposure_Mn | BsmtExposure_NB | BsmtExposure_No | BsmtFinType1_BLQ | BsmtFinType1_GLQ | BsmtFinType1_LwQ | BsmtFinType1_NB | BsmtFinType1_Rec | BsmtFinType1_Unf | BsmtFinType2_BLQ | BsmtFinType2_GLQ | BsmtFinType2_LwQ | BsmtFinType2_NB | BsmtFinType2_Rec | BsmtFinType2_Unf | Heating_GasW | Heating_Grav | Heating_OthW | Heating_Wall | HeatingQC_Fa | HeatingQC_Gd | HeatingQC_Po | HeatingQC_TA | CentralAir_Y | Electrical_FuseF | Electrical_FuseP | Electrical_Mix | Electrical_SBrkr | Electrical_nan | KitchenQual_Fa | KitchenQual_Gd | KitchenQual_TA | Functional_Maj2 | Functional_Min1 | Functional_Min2 | Functional_Mod | Functional_Typ | FireplaceQu_Fa | FireplaceQu_Gd | FireplaceQu_NF | FireplaceQu_Po | FireplaceQu_TA | GarageType_Attchd | GarageType_Basment | GarageType_BuiltIn | GarageType_CarPort | GarageType_Detchd | GarageType_NG | GarageFinish_NG | GarageFinish_RFn | GarageFinish_Unf | GarageQual_Fa | GarageQual_Gd | GarageQual_NG | GarageQual_Po | GarageQual_TA | GarageCond_Fa | GarageCond_Gd | GarageCond_NG | GarageCond_Po | GarageCond_TA | PavedDrive_P | PavedDrive_Y | PoolQC_Fa | PoolQC_Gd | PoolQC_NP | Fence_GdWo | Fence_MnPrv | Fence_MnWw | Fence_NF | MiscFeature_NE | MiscFeature_Shed | MiscFeature_TenC | SaleType_CWD | SaleType_Con | SaleType_ConLD | SaleType_ConLI | SaleType_ConLw | SaleType_New | SaleType_Oth | SaleType_WD | SaleCondition_AdjLand | SaleCondition_Alloca | SaleCondition_Family | SaleCondition_Normal | SaleCondition_Partial | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | -0.052114 | -0.083176 | 1.005390 | 0.776169 | 0.428552 | -0.932224 | -0.28229 | 0.713204 | -0.372929 | -0.704094 | 1.847518 | -0.1294 | 0.958169 | 0.146230 | 0.891472 | 0.947788 | 0.333696 | -0.148954 | 0.364459 | 0.051763 | -0.360803 | -0.112837 | -0.271032 | -0.069193 | -0.108754 | -0.506915 | 1.618941 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 |
| 1 | -0.422299 | -0.073531 | -2.292826 | -1.687625 | -0.570667 | -0.932224 | -0.28229 | 1.188778 | 0.089255 | -0.175397 | 0.762719 | -0.1294 | 0.473722 | 1.332321 | 0.891472 | -2.372137 | 1.665871 | 1.473725 | -0.749421 | -0.685547 | 1.088712 | -0.112837 | -0.271032 | -0.069193 | -0.108754 | -0.506915 | -1.396198 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 |
| 2 | -1.717949 | -0.706764 | 1.103357 | 0.921098 | 0.078825 | -0.932224 | -0.28229 | 0.920563 | -0.171408 | -0.473572 | 0.884529 | -0.1294 | 0.351219 | 0.146230 | 0.286100 | 1.070748 | 0.333696 | -0.402938 | -0.749421 | -0.071122 | -0.360803 | -0.112837 | -0.271032 | -0.069193 | -0.108754 | -0.876435 | -1.396198 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 |
| 3 | 0.271798 | -0.058301 | 0.907423 | 0.631240 | -0.570667 | 1.131669 | -0.28229 | -0.979478 | 0.117731 | -0.142823 | 1.475193 | -0.1294 | 1.073249 | 0.146230 | 0.891472 | 0.824828 | 0.333696 | 0.551855 | 1.207604 | 1.925761 | -0.360803 | -0.112837 | -0.271032 | -0.069193 | -0.108754 | -0.137395 | -1.396198 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 |
| 4 | -0.422299 | 0.061502 | -0.823324 | -1.687625 | -0.570667 | 0.295527 | -0.28229 | -0.528697 | -0.309406 | -0.293164 | 0.374306 | -0.1294 | 0.072801 | 0.146230 | -0.319271 | -0.650694 | -0.998480 | -0.835653 | -0.749421 | -0.685547 | -0.360803 | -0.112837 | -0.271032 | -0.069193 | -0.108754 | -0.876435 | 0.111371 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | ... | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 |
5 rows × 277 columns
## Display category columns levels of test data.
for i in test_imp_cat:
print(i , test_imp_cat[i].nunique())
MSSubClass 16 MSZoning 6 Street 2 Alley 3 LotShape 4 LandContour 4 Utilities 2 LotConfig 5 LandSlope 3 Neighborhood 25 Condition1 9 Condition2 5 BldgType 5 HouseStyle 7 OverallQual 10 OverallCond 9 RoofStyle 6 RoofMatl 4 Exterior1st 14 Exterior2nd 16 MasVnrType 5 ExterQual 4 ExterCond 5 Foundation 6 BsmtQual 5 BsmtCond 5 BsmtExposure 5 BsmtFinType1 7 BsmtFinType2 7 Heating 4 HeatingQC 5 CentralAir 2 Electrical 4 KitchenQual 5 Functional 8 FireplaceQu 6 GarageType 7 GarageFinish 4 GarageQual 5 GarageCond 6 PavedDrive 3 PoolQC 3 Fence 5 MiscFeature 4 SaleType 10 SaleCondition 6
## Check dimensions of category columns of test data before doing dummification.
test_imp_cat.shape
(1459, 46)
## Get dummies for category columns of test data,display dimesnionns and first 5 records.
test_catcols_dummy = pd.get_dummies(columns = test_imp_cat.columns, data = test_imp_cat, drop_first= True)
print(test_catcols_dummy.shape)
test_catcols_dummy.head()
(1459, 245)
| MSSubClass_150 | MSSubClass_160 | MSSubClass_180 | MSSubClass_190 | MSSubClass_20 | MSSubClass_30 | MSSubClass_40 | MSSubClass_45 | MSSubClass_50 | MSSubClass_60 | MSSubClass_70 | MSSubClass_75 | MSSubClass_80 | MSSubClass_85 | MSSubClass_90 | MSZoning_FV | MSZoning_RH | MSZoning_RL | MSZoning_RM | MSZoning_nan | Street_Pave | Alley_NAA | Alley_Pave | LotShape_IR2 | LotShape_IR3 | LotShape_Reg | LandContour_HLS | LandContour_Low | LandContour_Lvl | Utilities_nan | LotConfig_CulDSac | LotConfig_FR2 | LotConfig_FR3 | LotConfig_Inside | LandSlope_Mod | LandSlope_Sev | Neighborhood_Blueste | Neighborhood_BrDale | Neighborhood_BrkSide | Neighborhood_ClearCr | Neighborhood_CollgCr | Neighborhood_Crawfor | Neighborhood_Edwards | Neighborhood_Gilbert | Neighborhood_IDOTRR | Neighborhood_MeadowV | Neighborhood_Mitchel | Neighborhood_NAmes | Neighborhood_NPkVill | Neighborhood_NWAmes | Neighborhood_NoRidge | Neighborhood_NridgHt | Neighborhood_OldTown | Neighborhood_SWISU | Neighborhood_Sawyer | Neighborhood_SawyerW | Neighborhood_Somerst | Neighborhood_StoneBr | Neighborhood_Timber | Neighborhood_Veenker | Condition1_Feedr | Condition1_Norm | Condition1_PosA | Condition1_PosN | Condition1_RRAe | Condition1_RRAn | Condition1_RRNe | Condition1_RRNn | Condition2_Feedr | Condition2_Norm | Condition2_PosA | Condition2_PosN | BldgType_2fmCon | BldgType_Duplex | BldgType_Twnhs | BldgType_TwnhsE | HouseStyle_1.5Unf | HouseStyle_1Story | HouseStyle_2.5Unf | HouseStyle_2Story | HouseStyle_SFoyer | HouseStyle_SLvl | OverallQual_10 | OverallQual_2 | OverallQual_3 | OverallQual_4 | OverallQual_5 | OverallQual_6 | OverallQual_7 | OverallQual_8 | OverallQual_9 | OverallCond_2 | OverallCond_3 | OverallCond_4 | OverallCond_5 | OverallCond_6 | OverallCond_7 | OverallCond_8 | OverallCond_9 | RoofStyle_Gable | ... | ExterCond_TA | Foundation_CBlock | Foundation_PConc | Foundation_Slab | Foundation_Stone | Foundation_Wood | BsmtQual_Fa | BsmtQual_Gd | BsmtQual_NB | BsmtQual_TA | BsmtCond_Gd | BsmtCond_NB | BsmtCond_Po | BsmtCond_TA | BsmtExposure_Gd | BsmtExposure_Mn | BsmtExposure_NB | BsmtExposure_No | BsmtFinType1_BLQ | BsmtFinType1_GLQ | BsmtFinType1_LwQ | BsmtFinType1_NB | BsmtFinType1_Rec | BsmtFinType1_Unf | BsmtFinType2_BLQ | BsmtFinType2_GLQ | BsmtFinType2_LwQ | BsmtFinType2_NB | BsmtFinType2_Rec | BsmtFinType2_Unf | Heating_GasW | Heating_Grav | Heating_Wall | HeatingQC_Fa | HeatingQC_Gd | HeatingQC_Po | HeatingQC_TA | CentralAir_Y | Electrical_FuseF | Electrical_FuseP | Electrical_SBrkr | KitchenQual_Fa | KitchenQual_Gd | KitchenQual_TA | KitchenQual_nan | Functional_Maj2 | Functional_Min1 | Functional_Min2 | Functional_Mod | Functional_Sev | Functional_Typ | Functional_nan | FireplaceQu_Fa | FireplaceQu_Gd | FireplaceQu_NF | FireplaceQu_Po | FireplaceQu_TA | GarageType_Attchd | GarageType_Basment | GarageType_BuiltIn | GarageType_CarPort | GarageType_Detchd | GarageType_NG | GarageFinish_NG | GarageFinish_RFn | GarageFinish_Unf | GarageQual_Gd | GarageQual_NG | GarageQual_Po | GarageQual_TA | GarageCond_Fa | GarageCond_Gd | GarageCond_NG | GarageCond_Po | GarageCond_TA | PavedDrive_P | PavedDrive_Y | PoolQC_Gd | PoolQC_NP | Fence_GdWo | Fence_MnPrv | Fence_MnWw | Fence_NF | MiscFeature_NE | MiscFeature_Othr | MiscFeature_Shed | SaleType_CWD | SaleType_Con | SaleType_ConLD | SaleType_ConLI | SaleType_ConLw | SaleType_New | SaleType_Oth | SaleType_WD | SaleType_nan | SaleCondition_AdjLand | SaleCondition_Alloca | SaleCondition_Family | SaleCondition_Normal | SaleCondition_Partial | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | ... | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 |
| 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | ... | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 |
| 2 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | ... | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 |
| 3 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | ... | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 |
| 4 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | ... | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 |
5 rows × 245 columns
## Get missing columns in the test data set.
missing_cols = set( catcols_train_dummy.columns ) - set( test_catcols_dummy.columns )
## Add a missing column in test set with default value equal to 0.
for c in missing_cols:
test_catcols_dummy[c] = 0
test_catcols_dummy = test_catcols_dummy[catcols_train_dummy.columns]
## Check dimesnions of categoory columns of test data.
test_catcols_dummy.shape
(1459, 250)
## Combine cateory and numeric columns of test data.
test_data_combine = pd.concat([test_scaler, test_catcols_dummy], axis=1)
############################################### Decision Tree ##################################################################
## Import decision tree model.
from sklearn.tree import DecisionTreeRegressor
## Instantiate and fit a regression model.
dtr = DecisionTreeRegressor(max_depth=5,min_samples_leaf=10,min_samples_split=5,random_state=123)
dtr.fit(train_data_final,y_train)
DecisionTreeRegressor(ccp_alpha=0.0, criterion='mse', max_depth=5,
max_features=None, max_leaf_nodes=None,
min_impurity_decrease=0.0, min_impurity_split=None,
min_samples_leaf=10, min_samples_split=5,
min_weight_fraction_leaf=0.0, presort='deprecated',
random_state=123, splitter='best')
## Get the predictions on train and validation data.
pred_train = dtr.predict(train_data_final)
pred_test = dtr.predict(test_data_final)
## Get predictions for test data.
test_pred = dtr.predict(test_data_combine)
## Check dimensions of test data index column.
test_data.index.shape
(1459,)
## Check dimesnions of test predictions data.
test_pred.shape
(1459,)
## Prepare a dataframe with test data index,prediction values.
dataframe5 = pd.DataFrame({'Id' : test_data.index,
'SalePrice' : test_pred})
## Copy dataframe data into a CSV file.
dataframe5.to_csv('PredictionValues.csv',index=False)
## Import error metric libraries to measure RMSE.
from sklearn.metrics import mean_squared_error
from math import sqrt
## Display train and validation RMSE.
print("Train Error:",sqrt(mean_squared_error(y_train, pred_train)))
print("Test Error:",sqrt(mean_squared_error(y_test, pred_test)))
Train Error: 35145.50572611597 Test Error: 37672.08587542323
############################################## Random Forest ##################################################################
## Import random forest regressor model.
from sklearn.ensemble import RandomForestRegressor
## Instantiate a regressor model.
rc = RandomForestRegressor(n_estimators= 25, max_depth= 10)## ,min_samples_leaf = 2)## ,max_features='sqrt')
## Fit a model.
rc.fit(train_data_final,y_train)
RandomForestRegressor(bootstrap=True, ccp_alpha=0.0, criterion='mse',
max_depth=10, max_features='auto', max_leaf_nodes=None,
max_samples=None, min_impurity_decrease=0.0,
min_impurity_split=None, min_samples_leaf=1,
min_samples_split=2, min_weight_fraction_leaf=0.0,
n_estimators=25, n_jobs=None, oob_score=False,
random_state=None, verbose=0, warm_start=False)
## Get the predictions on train and validation data.
pred_train = rc.predict(train_data_final)
pred_test = rc.predict(test_data_final)
## Get predictions on test data.
test_pred = rc.predict(test_data_combine)
## Prepare a dataframe with test data index,prediction values.
dataframe6 = pd.DataFrame({'Id' : test_data.index,
'SalePrice' : test_pred})
## Copy dataframe data into a CSV file.
dataframe6.to_csv('PredictionValues.csv',index=False)
## Display RMSE values for train and validation data.
print("Train Error:",sqrt(mean_squared_error(y_train, pred_train)))
print("Test Error:",sqrt(mean_squared_error(y_test, pred_test)))
Train Error: 14189.808357485022 Test Error: 28007.627868664316
################################################### AdaBoost ##################################################################
## Import adaboost regressor model.
from sklearn.ensemble import AdaBoostRegressor
## Instantiate regressor model and fit it.
Adaboost_model = AdaBoostRegressor(n_estimators=50,learning_rate=1)
%time Adaboost_model.fit(train_data_final, y_train)
Wall time: 424 ms
AdaBoostRegressor(base_estimator=None, learning_rate=1.0, loss='linear',
n_estimators=50, random_state=None)
## Get the predictions on train and validation data.
pred_train = Adaboost_model.predict(train_data_final)
pred_test = Adaboost_model.predict(test_data_final)
## Get predictions on test data.
test_pred = Adaboost_model.predict(test_data_combine)
## Display RMSE value for train and validation data.
print("Train Error:",sqrt(mean_squared_error(y_train, pred_train)))
print("Test Error:",sqrt(mean_squared_error(y_test, pred_test)))
Train Error: 30520.585335079977 Test Error: 36586.41368072027
##################################################### GradientBoosting #########################################################
## Import Graident boot model library.
from sklearn.ensemble import GradientBoostingRegressor
## Innstantiate GBR and fit it.
gbm = GradientBoostingRegressor(n_estimators=50,learning_rate=0.8,random_state=474)
%time gbm.fit(X=train_data_final, y=y_train)
Wall time: 318 ms
GradientBoostingRegressor(alpha=0.9, ccp_alpha=0.0, criterion='friedman_mse',
init=None, learning_rate=0.8, loss='ls', max_depth=3,
max_features=None, max_leaf_nodes=None,
min_impurity_decrease=0.0, min_impurity_split=None,
min_samples_leaf=1, min_samples_split=2,
min_weight_fraction_leaf=0.0, n_estimators=50,
n_iter_no_change=None, presort='deprecated',
random_state=474, subsample=1.0, tol=0.0001,
validation_fraction=0.1, verbose=0, warm_start=False)
## Get the predictions on train and validation.
pred_train = gbm.predict(train_data_final)
pred_test = gbm.predict(test_data_final)
## Get predictions on test data.
test_pred = gbm.predict(test_data_combine)
## Dispay RMSE value for train and validation.
print("Train Error:",sqrt(mean_squared_error(y_train, pred_train)))
print("Test Error:",sqrt(mean_squared_error(y_test, pred_test)))
Train Error: 8570.43403879048 Test Error: 30745.40731980334
################################################## XGradient Boosting ##########################################################
## Import XGBoost model library.
import xgboost as xgb
from xgboost.sklearn import XGBRegressor
## Instantiate XGBR and fit it.
xgb_model=XGBRegressor(n_estimators=100,learning_rate=0.8)
%time xgb_model.fit(train_data_final,y_train,verbose=True)
C:\Users\nagar\Anaconda3\lib\site-packages\xgboost\core.py:587: FutureWarning: Series.base is deprecated and will be removed in a future version if getattr(data, 'base', None) is not None and \
[00:56:41] WARNING: C:/Jenkins/workspace/xgboost-win64_release_0.90/src/objective/regression_obj.cu:152: reg:linear is now deprecated in favor of reg:squarederror. Wall time: 1.42 s
XGBRegressor(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, gamma=0,
importance_type='gain', learning_rate=0.8, max_delta_step=0,
max_depth=3, min_child_weight=1, missing=None, n_estimators=100,
n_jobs=1, nthread=None, objective='reg:linear', random_state=0,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, seed=None,
silent=None, subsample=1, verbosity=1)
## Get the predictions on train and validation.
pred_train = xgb_model.predict(train_data_final)
pred_test = xgb_model.predict(test_data_final)
## Get predictions on test data.
test_pred = xgb_model.predict(test_data_combine)
## Prepare a dataframe with test inndex,preidction values.
dataframe7 = pd.DataFrame({'Id' : test_data.index,
'SalePrice' : test_pred})
## Copy dataframe data into a CSV file.
dataframe7.to_csv('PredictionValues.csv',index=False)
## Display scatter plot for actual target and prediction values.
plt.figure(figsize=(15,8))
plt.scatter(y_train,pred_train, c= 'brown')
plt.xlabel('Y Train')
plt.ylabel('Predicted Y')
plt.show()
## Get RMSE value for train and validation data.
print("Train Error:",sqrt(mean_squared_error(y_train, pred_train)))
print("Test Error:",sqrt(mean_squared_error(y_test, pred_test)))
Train Error: 4641.200474416745 Test Error: 32621.566438550162
#################################################### SVM #######################################################################
## Import SVR model library.
from sklearn.svm import SVR
## Instantiate SVR model.
svr_model = SVR()
svr_model
SVR(C=1.0, cache_size=200, coef0=0.0, degree=3, epsilon=0.1, gamma='scale',
kernel='rbf', max_iter=-1, shrinking=True, tol=0.001, verbose=False)
## Fit a model.
svr_model.fit(X = train_data_final, y = y_train)
SVR(C=1.0, cache_size=200, coef0=0.0, degree=3, epsilon=0.1, gamma='scale',
kernel='rbf', max_iter=-1, shrinking=True, tol=0.001, verbose=False)
## Get the predictions on train and validation.
pred_train = svr_model.predict(train_data_final)
pred_test = svr_model.predict(test_data_final)
## Get predictions on test data.
test_pred = svr_model.predict(test_data_combine)
## Get predictions on test data.
print("Train Error:",sqrt(mean_squared_error(y_train, pred_train)))
print("Test Error:",sqrt(mean_squared_error(y_test, pred_test)))
Train Error: 83315.89848130586 Test Error: 79142.24398988248
################################################### KNN #######################################################################
## Import KNN model library.
from sklearn.neighbors import KNeighborsRegressor
## Instantiate KNN model and fit it.
knn = KNeighborsRegressor(algorithm = 'brute', n_neighbors = 4,
metric = "euclidean")
knn.fit(train_data_final, y_train)
KNeighborsRegressor(algorithm='brute', leaf_size=30, metric='euclidean',
metric_params=None, n_jobs=None, n_neighbors=4, p=2,
weights='uniform')
## Get the predictions on train and validation.
pred_train = knn.predict(train_data_final)
pred_test = knn.predict(test_data_final)
## Get predictions on test data.
test_pred = knn.predict(test_data_combine)
## Display RMSE values fo train and validation.
print("Train Error:",sqrt(mean_squared_error(y_train, pred_train)))
print("Test Error:",sqrt(mean_squared_error(y_test, pred_test)))
Train Error: 31531.213116588428 Test Error: 37136.77105383857
############################################# Neural Network Linear Algoritham #################################################
## Import Sequential,Dense model libraries.
from keras.models import Sequential
from keras.layers import Dense
C:\Users\nagar\Anaconda3\lib\site-packages\h5py\__init__.py:36: FutureWarning: Conversion of the second argument of issubdtype from `float` to `np.floating` is deprecated. In future, it will be treated as `np.float64 == np.dtype(float).type`. from ._conv import register_converters as _register_converters Using TensorFlow backend.
## Instantiate squential model.
model = Sequential()
## Add dense model.
model.add(Dense(1, input_dim=train_data_final.shape[1]))
## Add compiler to model.
model.compile(loss='mse', optimizer='rmsprop')
## Fit a model.
model.fit(train_data_final, y_train, epochs=150, batch_size=32)
Epoch 1/150 1022/1022 [==============================] - 0s 184us/step - loss: 38950132341.2290 Epoch 2/150 1022/1022 [==============================] - 0s 65us/step - loss: 38949634272.4384 Epoch 3/150 1022/1022 [==============================] - 0s 73us/step - loss: 38949164879.6556 Epoch 4/150 1022/1022 [==============================] - 0s 65us/step - loss: 38948694541.0254 Epoch 5/150 1022/1022 [==============================] - 0s 63us/step - loss: 38948221424.9706 Epoch 6/150 1022/1022 [==============================] - 0s 62us/step - loss: 38947748821.9178 Epoch 7/150 1022/1022 [==============================] - 0s 63us/step - loss: 38947278976.2505 Epoch 8/150 1022/1022 [==============================] - 0s 64us/step - loss: 38946805475.4442 Epoch 9/150 1022/1022 [==============================] - 0s 64us/step - loss: 38946334836.2270 Epoch 10/150 1022/1022 [==============================] - 0s 63us/step - loss: 38945865423.4051 Epoch 11/150 1022/1022 [==============================] - 0s 45us/step - loss: 38945391818.3953 Epoch 12/150 1022/1022 [==============================] - 0s 30us/step - loss: 38944927595.7104 Epoch 13/150 1022/1022 [==============================] - 0s 38us/step - loss: 38944455750.1370 Epoch 14/150 1022/1022 [==============================] - 0s 62us/step - loss: 38943984902.5127 Epoch 15/150 1022/1022 [==============================] - 0s 68us/step - loss: 38943514335.4364 Epoch 16/150 1022/1022 [==============================] - 0s 66us/step - loss: 38943043559.9530 Epoch 17/150 1022/1022 [==============================] - 0s 114us/step - loss: 38942572123.1781 Epoch 18/150 1022/1022 [==============================] - 0s 118us/step - loss: 38942101592.1722 Epoch 19/150 1022/1022 [==============================] - 0s 105us/step - loss: 38941631490.0039 Epoch 20/150 1022/1022 [==============================] - 0s 113us/step - loss: 38941163191.3581 Epoch 21/150 1022/1022 [==============================] - 0s 98us/step - loss: 38940687963.1781 Epoch 22/150 1022/1022 [==============================] - 0s 113us/step - loss: 38940215340.0861 Epoch 23/150 1022/1022 [==============================] - 0s 126us/step - loss: 38939746087.5773 Epoch 24/150 1022/1022 [==============================] - 0s 118us/step - loss: 38939272278.1683 Epoch 25/150 1022/1022 [==============================] - 0s 119us/step - loss: 38938800705.1272 Epoch 26/150 1022/1022 [==============================] - 0s 102us/step - loss: 38938326987.8982 Epoch 27/150 1022/1022 [==============================] - 0s 128us/step - loss: 38937856156.3053 Epoch 28/150 1022/1022 [==============================] - 0s 103us/step - loss: 38937385749.5421 Epoch 29/150 1022/1022 [==============================] - 0s 128us/step - loss: 38936916128.3131 Epoch 30/150 1022/1022 [==============================] - 0s 139us/step - loss: 38936443036.3053 Epoch 31/150 1022/1022 [==============================] - 0s 124us/step - loss: 38935966850.2544 Epoch 32/150 1022/1022 [==============================] - 0s 125us/step - loss: 38935498335.1859 Epoch 33/150 1022/1022 [==============================] - 0s 121us/step - loss: 38935027595.7730 Epoch 34/150 1022/1022 [==============================] - 0s 126us/step - loss: 38934554575.9061 Epoch 35/150 1022/1022 [==============================] - 0s 117us/step - loss: 38934078297.6751 Epoch 36/150 1022/1022 [==============================] - 0s 127us/step - loss: 38933606452.1018 Epoch 37/150 1022/1022 [==============================] - 0s 123us/step - loss: 38933137327.8434 Epoch 38/150 1022/1022 [==============================] - 0s 103us/step - loss: 38932665971.2250 Epoch 39/150 1022/1022 [==============================] - 0s 147us/step - loss: 38932197985.1898 Epoch 40/150 1022/1022 [==============================] - 0s 148us/step - loss: 38931723446.3562 Epoch 41/150 1022/1022 [==============================] - 0s 128us/step - loss: 38931247705.1742 Epoch 42/150 1022/1022 [==============================] - 0s 124us/step - loss: 38930781755.1155 Epoch 43/150 1022/1022 [==============================] - 0s 121us/step - loss: 38930308238.2779 Epoch 44/150 1022/1022 [==============================] - 0s 112us/step - loss: 38929839057.9100 Epoch 45/150 1022/1022 [==============================] - 0s 109us/step - loss: 38929368971.7730 Epoch 46/150 1022/1022 [==============================] - 0s 98us/step - loss: 38928897426.7867 Epoch 47/150 1022/1022 [==============================] - 0s 101us/step - loss: 38928424799.6869 Epoch 48/150 1022/1022 [==============================] - 0s 116us/step - loss: 38927952926.0587 Epoch 49/150 1022/1022 [==============================] - 0s 105us/step - loss: 38927480607.5616 Epoch 50/150 1022/1022 [==============================] - 0s 105us/step - loss: 38927012705.6908 Epoch 51/150 1022/1022 [==============================] - 0s 112us/step - loss: 38926542968.2348 Epoch 52/150 1022/1022 [==============================] - 0s 97us/step - loss: 38926072485.3229 Epoch 53/150 1022/1022 [==============================] - 0s 101us/step - loss: 38925599024.5949 Epoch 54/150 1022/1022 [==============================] - 0s 115us/step - loss: 38925129114.8023 Epoch 55/150 1022/1022 [==============================] - 0s 106us/step - loss: 38924656904.5166 Epoch 56/150 1022/1022 [==============================] - 0s 101us/step - loss: 38924186962.6614 Epoch 57/150 1022/1022 [==============================] - 0s 110us/step - loss: 38923716331.4599 Epoch 58/150 1022/1022 [==============================] - 0s 98us/step - loss: 38923243115.2094 Epoch 59/150 1022/1022 [==============================] - 0s 40us/step - loss: 38922773994.9589 Epoch 60/150 1022/1022 [==============================] - 0s 50us/step - loss: 38922302289.6595 Epoch 61/150 1022/1022 [==============================] - 0s 113us/step - loss: 38921833646.3405 Epoch 62/150 1022/1022 [==============================] - 0s 117us/step - loss: 38921363275.6477 Epoch 63/150 1022/1022 [==============================] - 0s 113us/step - loss: 38920891438.0900 Epoch 64/150 1022/1022 [==============================] - 0s 127us/step - loss: 38920420109.5264 Epoch 65/150 1022/1022 [==============================] - 0s 115us/step - loss: 38919952404.0391 Epoch 66/150 1022/1022 [==============================] - 0s 121us/step - loss: 38919485283.6947 Epoch 67/150 1022/1022 [==============================] - 0s 139us/step - loss: 38919016111.3425 Epoch 68/150 1022/1022 [==============================] - 0s 128us/step - loss: 38918539676.8063 Epoch 69/150 1022/1022 [==============================] - ETA: 0s - loss: 38476961319.384 - 0s 145us/step - loss: 38918067819.2094 Epoch 70/150 1022/1022 [==============================] - 0s 104us/step - loss: 38917594639.0294 Epoch 71/150 1022/1022 [==============================] - 0s 71us/step - loss: 38917127137.9413 Epoch 72/150 1022/1022 [==============================] - 0s 79us/step - loss: 38916659083.7730 Epoch 73/150 1022/1022 [==============================] - 0s 75us/step - loss: 38916187659.0215 Epoch 74/150 1022/1022 [==============================] - 0s 69us/step - loss: 38915717332.4149 Epoch 75/150 1022/1022 [==============================] - 0s 65us/step - loss: 38915245623.1076 Epoch 76/150 1022/1022 [==============================] - 0s 64us/step - loss: 38914777031.8904 Epoch 77/150 1022/1022 [==============================] - 0s 63us/step - loss: 38914304348.6810 Epoch 78/150 1022/1022 [==============================] - 0s 65us/step - loss: 38913836699.3033 Epoch 79/150 1022/1022 [==============================] - 0s 61us/step - loss: 38913364400.8454 Epoch 80/150 1022/1022 [==============================] - 0s 62us/step - loss: 38912893204.5401 Epoch 81/150 1022/1022 [==============================] - 0s 68us/step - loss: 38912422236.6810 Epoch 82/150 1022/1022 [==============================] - 0s 64us/step - loss: 38911953693.5577 Epoch 83/150 1022/1022 [==============================] - 0s 67us/step - loss: 38911478044.5558 Epoch 84/150 1022/1022 [==============================] - 0s 65us/step - loss: 38911008928.3131 Epoch 85/150 1022/1022 [==============================] - 0s 63us/step - loss: 38910541519.4051 Epoch 86/150 1022/1022 [==============================] - 0s 65us/step - loss: 38910071152.7202 Epoch 87/150 1022/1022 [==============================] - 0s 68us/step - loss: 38909592698.2387 Epoch 88/150 1022/1022 [==============================] - 0s 63us/step - loss: 38909126179.0685 Epoch 89/150 1022/1022 [==============================] - 0s 65us/step - loss: 38908655884.5245 Epoch 90/150 1022/1022 [==============================] - 0s 68us/step - loss: 38908182511.9687 Epoch 91/150 1022/1022 [==============================] - 0s 62us/step - loss: 38907713459.8513 Epoch 92/150 1022/1022 [==============================] - 0s 64us/step - loss: 38907244191.3112 Epoch 93/150 1022/1022 [==============================] - 0s 65us/step - loss: 38906776886.6067 Epoch 94/150 1022/1022 [==============================] - 0s 64us/step - loss: 38906303602.2231 Epoch 95/150 1022/1022 [==============================] - 0s 66us/step - loss: 38905834966.9198 Epoch 96/150 1022/1022 [==============================] - 0s 65us/step - loss: 38905361722.6145 Epoch 97/150 1022/1022 [==============================] - 0s 42us/step - loss: 38904891412.0391 Epoch 98/150 1022/1022 [==============================] - 0s 34us/step - loss: 38904422840.8611 Epoch 99/150 1022/1022 [==============================] - 0s 36us/step - loss: 38903954566.2622 Epoch 100/150 1022/1022 [==============================] - 0s 49us/step - loss: 38903485666.4423 Epoch 101/150 1022/1022 [==============================] - 0s 66us/step - loss: 38903014397.9961 Epoch 102/150 1022/1022 [==============================] - 0s 68us/step - loss: 38902547193.4873 Epoch 103/150 1022/1022 [==============================] - 0s 68us/step - loss: 38902074333.9335 Epoch 104/150 1022/1022 [==============================] - 0s 65us/step - loss: 38901601746.9119 Epoch 105/150 1022/1022 [==============================] - 0s 64us/step - loss: 38901134750.8102 Epoch 106/150 1022/1022 [==============================] - 0s 65us/step - loss: 38900659698.9746 Epoch 107/150 1022/1022 [==============================] - 0s 62us/step - loss: 38900187865.4247 Epoch 108/150 1022/1022 [==============================] - 0s 71us/step - loss: 38899720721.0333 Epoch 109/150 1022/1022 [==============================] - 0s 67us/step - loss: 38899249524.7280 Epoch 110/150 1022/1022 [==============================] - 0s 64us/step - loss: 38898777454.7162 Epoch 111/150 1022/1022 [==============================] - 0s 69us/step - loss: 38898310150.0117 Epoch 112/150 1022/1022 [==============================] - 0s 69us/step - loss: 38897836921.7378 Epoch 113/150 1022/1022 [==============================] - 0s 67us/step - loss: 38897367448.7984 Epoch 114/150 1022/1022 [==============================] - 0s 39us/step - loss: 38896895298.6301 Epoch 115/150 1022/1022 [==============================] - 0s 47us/step - loss: 38896427416.7984 Epoch 116/150 1022/1022 [==============================] - 0s 60us/step - loss: 38895957619.2250 Epoch 117/150 1022/1022 [==============================] - 0s 62us/step - loss: 38895486386.8493 Epoch 118/150 1022/1022 [==============================] - 0s 65us/step - loss: 38895017154.3796 Epoch 119/150 1022/1022 [==============================] - 0s 70us/step - loss: 38894545220.6341 Epoch 120/150 1022/1022 [==============================] - 0s 68us/step - loss: 38894076917.9804 Epoch 121/150 1022/1022 [==============================] - 0s 70us/step - loss: 38893607437.0254 Epoch 122/150 1022/1022 [==============================] - 0s 59us/step - loss: 38893137831.8278 Epoch 123/150 1022/1022 [==============================] - 0s 71us/step - loss: 38892668459.0841 Epoch 124/150 1022/1022 [==============================] - 0s 74us/step - loss: 38892196096.5010 Epoch 125/150 1022/1022 [==============================] - 0s 66us/step - loss: 38891727821.9022 Epoch 126/150 1022/1022 [==============================] - 0s 76us/step - loss: 38891260501.1663 Epoch 127/150 1022/1022 [==============================] - 0s 77us/step - loss: 38890792571.2407 Epoch 128/150 1022/1022 [==============================] - 0s 68us/step - loss: 38890321996.1487 Epoch 129/150 1022/1022 [==============================] - 0s 45us/step - loss: 38889853160.4540 Epoch 130/150 1022/1022 [==============================] - 0s 37us/step - loss: 38889383659.4599 Epoch 131/150 1022/1022 [==============================] - 0s 38us/step - loss: 38888916731.4912 Epoch 132/150 1022/1022 [==============================] - 0s 45us/step - loss: 38888446685.4325 Epoch 133/150 1022/1022 [==============================] - 0s 75us/step - loss: 38887980827.5538 Epoch 134/150 1022/1022 [==============================] - 0s 62us/step - loss: 38887511867.6164 Epoch 135/150 1022/1022 [==============================] - 0s 66us/step - loss: 38887042005.9178 Epoch 136/150 1022/1022 [==============================] - 0s 72us/step - loss: 38886571346.6614 Epoch 137/150 1022/1022 [==============================] - 0s 72us/step - loss: 38886098130.4110 Epoch 138/150 1022/1022 [==============================] - 0s 69us/step - loss: 38885629767.6399 Epoch 139/150 1022/1022 [==============================] - 0s 66us/step - loss: 38885158342.8885 Epoch 140/150 1022/1022 [==============================] - 0s 65us/step - loss: 38884692785.5969 Epoch 141/150 1022/1022 [==============================] - 0s 67us/step - loss: 38884222058.2074 Epoch 142/150 1022/1022 [==============================] - 0s 69us/step - loss: 38883749519.2798 Epoch 143/150 1022/1022 [==============================] - 0s 67us/step - loss: 38883275232.9393 Epoch 144/150 1022/1022 [==============================] - 0s 63us/step - loss: 38882804938.3953 Epoch 145/150 1022/1022 [==============================] - 0s 63us/step - loss: 38882336046.5910 Epoch 146/150 1022/1022 [==============================] - 0s 42us/step - loss: 38881863659.9609 Epoch 147/150 1022/1022 [==============================] - 0s 52us/step - loss: 38881392503.7339 Epoch 148/150 1022/1022 [==============================] - 0s 57us/step - loss: 38880921403.6164 Epoch 149/150 1022/1022 [==============================] - 0s 64us/step - loss: 38880450804.4775 Epoch 150/150 1022/1022 [==============================] - 0s 61us/step - loss: 38879982473.7691
<keras.callbacks.callbacks.History at 0x209042bbac8>
## Get the predictions on train and validation.
pred_train = model.predict(train_data_final)
pred_test = model.predict(test_data_final)
## Get predictions on test data.
test_pred = model.predict(test_data_combine)
## Display RMSE value for train and validation.
print("Train Error:",sqrt(mean_squared_error(y_train, pred_train)))
print("Test Error:",sqrt(mean_squared_error(y_test, pred_test)))
Train Error: 197179.45735965777 Test Error: 197930.06369128206
## Instantiate sequential model.
model1 = Sequential()
## Add 2 dense layes to model.
model1.add(Dense(8, input_dim=train_data_final.shape[1], activation='relu', kernel_initializer='uniform'))
model1.add(Dense(1, kernel_initializer='uniform'))
## Add compiler to model.
model.compile(loss='mse', optimizer='rmsprop')
## Fit a model.
model.fit(train_data_final, y_train, epochs=150, batch_size=32)
Epoch 1/150 1022/1022 [==============================] - 0s 101us/step - loss: 38879393034.5205 Epoch 2/150 1022/1022 [==============================] - 0s 34us/step - loss: 38878900817.1585 Epoch 3/150 1022/1022 [==============================] - 0s 34us/step - loss: 38878430346.2701 Epoch 4/150 1022/1022 [==============================] - 0s 32us/step - loss: 38877960348.3053 Epoch 5/150 1022/1022 [==============================] - 0s 35us/step - loss: 38877490029.7143 Epoch 6/150 1022/1022 [==============================] - 0s 32us/step - loss: 38877020520.7045 Epoch 7/150 1022/1022 [==============================] - 0s 40us/step - loss: 38876549120.0000 Epoch 8/150 1022/1022 [==============================] - ETA: 0s - loss: 53787824128.000 - 0s 34us/step - loss: 38876078805.4168 Epoch 9/150 1022/1022 [==============================] - 0s 32us/step - loss: 38875613256.1409 Epoch 10/150 1022/1022 [==============================] - 0s 35us/step - loss: 38875140568.9237 Epoch 11/150 1022/1022 [==============================] - 0s 37us/step - loss: 38874668246.4188 Epoch 12/150 1022/1022 [==============================] - 0s 37us/step - loss: 38874196156.3679 Epoch 13/150 1022/1022 [==============================] - 0s 44us/step - loss: 38873722928.0939 Epoch 14/150 1022/1022 [==============================] - 0s 39us/step - loss: 38873252593.4716 Epoch 15/150 1022/1022 [==============================] - 0s 46us/step - loss: 38872781441.2524 Epoch 16/150 1022/1022 [==============================] - 0s 45us/step - loss: 38872309158.8258 Epoch 17/150 1022/1022 [==============================] - 0s 45us/step - loss: 38871834415.5930 Epoch 18/150 1022/1022 [==============================] - 0s 54us/step - loss: 38871366469.6360 Epoch 19/150 1022/1022 [==============================] - 0s 61us/step - loss: 38870895597.9648 Epoch 20/150 1022/1022 [==============================] - 0s 64us/step - loss: 38870424926.6849 Epoch 21/150 1022/1022 [==============================] - 0s 62us/step - loss: 38869953277.4951 Epoch 22/150 1022/1022 [==============================] - 0s 63us/step - loss: 38869486557.9335 Epoch 23/150 1022/1022 [==============================] - 0s 67us/step - loss: 38869017337.4873 Epoch 24/150 1022/1022 [==============================] - 0s 32us/step - loss: 38868544582.1370 Epoch 25/150 1022/1022 [==============================] - 0s 31us/step - loss: 38868074804.6027 Epoch 26/150 1022/1022 [==============================] - 0s 39us/step - loss: 38867604017.0959 Epoch 27/150 1022/1022 [==============================] - 0s 45us/step - loss: 38867137850.6145 Epoch 28/150 1022/1022 [==============================] - 0s 62us/step - loss: 38866666229.4795 Epoch 29/150 1022/1022 [==============================] - 0s 63us/step - loss: 38866194840.7984 Epoch 30/150 1022/1022 [==============================] - 0s 65us/step - loss: 38865725179.4912 Epoch 31/150 1022/1022 [==============================] - 0s 61us/step - loss: 38865252632.5479 Epoch 32/150 1022/1022 [==============================] - 0s 63us/step - loss: 38864782790.8885 Epoch 33/150 1022/1022 [==============================] - 0s 61us/step - loss: 38864315578.3640 Epoch 34/150 1022/1022 [==============================] - 0s 61us/step - loss: 38863844826.9276 Epoch 35/150 1022/1022 [==============================] - 0s 59us/step - loss: 38863375366.0117 Epoch 36/150 1022/1022 [==============================] - 0s 58us/step - loss: 38862903973.3229 Epoch 37/150 1022/1022 [==============================] - 0s 50us/step - loss: 38862436492.2740 Epoch 38/150 1022/1022 [==============================] - 0s 40us/step - loss: 38861967436.1487 Epoch 39/150 1022/1022 [==============================] - 0s 49us/step - loss: 38861495686.7632 Epoch 40/150 1022/1022 [==============================] - 0s 59us/step - loss: 38861025031.5147 Epoch 41/150 1022/1022 [==============================] - 0s 64us/step - loss: 38860555013.5108 Epoch 42/150 1022/1022 [==============================] - 0s 49us/step - loss: 38860082999.6086 Epoch 43/150 1022/1022 [==============================] - 0s 33us/step - loss: 38859612272.2192 Epoch 44/150 1022/1022 [==============================] - 0s 36us/step - loss: 38859143396.4462 Epoch 45/150 1022/1022 [==============================] - 0s 38us/step - loss: 38858674424.4853 Epoch 46/150 1022/1022 [==============================] - 0s 54us/step - loss: 38858204354.3796 Epoch 47/150 1022/1022 [==============================] - 0s 67us/step - loss: 38857734532.7593 Epoch 48/150 1022/1022 [==============================] - 0s 67us/step - loss: 38857264013.7769 Epoch 49/150 1022/1022 [==============================] - 0s 43us/step - loss: 38856790400.7515 Epoch 50/150 1022/1022 [==============================] - 0s 40us/step - loss: 38856322438.7632 Epoch 51/150 1022/1022 [==============================] - 0s 64us/step - loss: 38855850036.1018 Epoch 52/150 1022/1022 [==============================] - 0s 84us/step - loss: 38855380947.9139 Epoch 53/150 1022/1022 [==============================] - 0s 89us/step - loss: 38854905615.5303 Epoch 54/150 1022/1022 [==============================] - 0s 84us/step - loss: 38854435296.9393 Epoch 55/150 1022/1022 [==============================] - 0s 82us/step - loss: 38853971006.1213 Epoch 56/150 1022/1022 [==============================] - 0s 86us/step - loss: 38853499332.8845 Epoch 57/150 1022/1022 [==============================] - 0s 80us/step - loss: 38853030064.3444 Epoch 58/150 1022/1022 [==============================] - 0s 50us/step - loss: 38852561645.4638 Epoch 59/150 1022/1022 [==============================] - 0s 30us/step - loss: 38852092801.7534 Epoch 60/150 1022/1022 [==============================] - 0s 42us/step - loss: 38851620162.6301 Epoch 61/150 1022/1022 [==============================] - 0s 62us/step - loss: 38851149595.5538 Epoch 62/150 1022/1022 [==============================] - 0s 62us/step - loss: 38850680667.6791 Epoch 63/150 1022/1022 [==============================] - 0s 65us/step - loss: 38850207335.2016 Epoch 64/150 1022/1022 [==============================] - 0s 69us/step - loss: 38849736367.3425 Epoch 65/150 1022/1022 [==============================] - 0s 64us/step - loss: 38849262698.2074 Epoch 66/150 1022/1022 [==============================] - 0s 66us/step - loss: 38848793261.3386 Epoch 67/150 1022/1022 [==============================] - 0s 61us/step - loss: 38848324473.7378 Epoch 68/150 1022/1022 [==============================] - 0s 35us/step - loss: 38847850956.9002 Epoch 69/150 1022/1022 [==============================] - 0s 73us/step - loss: 38847385896.5793 Epoch 70/150 1022/1022 [==============================] - 0s 75us/step - loss: 38846918183.0763 Epoch 71/150 1022/1022 [==============================] - 0s 77us/step - loss: 38846447355.4912 Epoch 72/150 1022/1022 [==============================] - 0s 81us/step - loss: 38845979124.9785 Epoch 73/150 1022/1022 [==============================] - 0s 76us/step - loss: 38845504710.3875 Epoch 74/150 1022/1022 [==============================] - 0s 60us/step - loss: 38845038696.2035 Epoch 75/150 1022/1022 [==============================] - 0s 32us/step - loss: 38844569908.6027 Epoch 76/150 1022/1022 [==============================] - 0s 50us/step - loss: 38844095790.5910 Epoch 77/150 1022/1022 [==============================] - 0s 62us/step - loss: 38843621504.2505 Epoch 78/150 1022/1022 [==============================] - 0s 62us/step - loss: 38843153365.9178 Epoch 79/150 1022/1022 [==============================] - 0s 69us/step - loss: 38842683680.5636 Epoch 80/150 1022/1022 [==============================] - 0s 67us/step - loss: 38842210243.8826 Epoch 81/150 1022/1022 [==============================] - 0s 71us/step - loss: 38841743267.8200 Epoch 82/150 1022/1022 [==============================] - 0s 34us/step - loss: 38841270031.5303 Epoch 83/150 1022/1022 [==============================] - 0s 51us/step - loss: 38840799360.2505 Epoch 84/150 1022/1022 [==============================] - 0s 63us/step - loss: 38840329474.5049 Epoch 85/150 1022/1022 [==============================] - 0s 66us/step - loss: 38839858915.4442 Epoch 86/150 1022/1022 [==============================] - 0s 69us/step - loss: 38839388532.7280 Epoch 87/150 1022/1022 [==============================] - 0s 63us/step - loss: 38838917364.4775 Epoch 88/150 1022/1022 [==============================] - 0s 69us/step - loss: 38838450797.2133 Epoch 89/150 1022/1022 [==============================] - 0s 66us/step - loss: 38837976238.3405 Epoch 90/150 1022/1022 [==============================] - 0s 66us/step - loss: 38837507106.0665 Epoch 91/150 1022/1022 [==============================] - 0s 48us/step - loss: 38837029918.0587 Epoch 92/150 1022/1022 [==============================] - 0s 31us/step - loss: 38836562589.3072 Epoch 93/150 1022/1022 [==============================] - 0s 34us/step - loss: 38836093553.2211 Epoch 94/150 1022/1022 [==============================] - 0s 39us/step - loss: 38835620733.7456 Epoch 95/150 1022/1022 [==============================] - 0s 56us/step - loss: 38835155320.7358 Epoch 96/150 1022/1022 [==============================] - 0s 67us/step - loss: 38834682421.1037 Epoch 97/150 1022/1022 [==============================] - 0s 66us/step - loss: 38834213946.1135 Epoch 98/150 1022/1022 [==============================] - 0s 78us/step - loss: 38833745302.7945 Epoch 99/150 1022/1022 [==============================] - 0s 67us/step - loss: 38833277949.9961 Epoch 100/150 1022/1022 [==============================] - 0s 66us/step - loss: 38832806332.8689 Epoch 101/150 1022/1022 [==============================] - 0s 53us/step - loss: 38832337501.1820 Epoch 102/150 1022/1022 [==============================] - 0s 40us/step - loss: 38831865956.1957 Epoch 103/150 1022/1022 [==============================] - 0s 58us/step - loss: 38831397685.6047 Epoch 104/150 1022/1022 [==============================] - 0s 78us/step - loss: 38830925483.3346 Epoch 105/150 1022/1022 [==============================] - 0s 87us/step - loss: 38830453982.4344 Epoch 106/150 1022/1022 [==============================] - 0s 86us/step - loss: 38829983651.8200 Epoch 107/150 1022/1022 [==============================] - 0s 81us/step - loss: 38829512884.3523 Epoch 108/150 1022/1022 [==============================] - 0s 31us/step - loss: 38829046777.9883 Epoch 109/150 1022/1022 [==============================] - 0s 33us/step - loss: 38828575269.0724 Epoch 110/150 1022/1022 [==============================] - 0s 35us/step - loss: 38828102978.6301 Epoch 111/150 1022/1022 [==============================] - 0s 51us/step - loss: 38827629694.2466 Epoch 112/150 1022/1022 [==============================] - 0s 72us/step - loss: 38827161531.8669 Epoch 113/150 1022/1022 [==============================] - 0s 70us/step - loss: 38826688984.9237 Epoch 114/150 1022/1022 [==============================] - 0s 69us/step - loss: 38826218225.4716 Epoch 115/150 1022/1022 [==============================] - 0s 65us/step - loss: 38825744219.6791 Epoch 116/150 1022/1022 [==============================] - 0s 71us/step - loss: 38825274935.1076 Epoch 117/150 1022/1022 [==============================] - 0s 66us/step - loss: 38824804792.8611 Epoch 118/150 1022/1022 [==============================] - 0s 79us/step - loss: 38824336971.1468 Epoch 119/150 1022/1022 [==============================] - 0s 82us/step - loss: 38823862913.2524 Epoch 120/150 1022/1022 [==============================] - 0s 80us/step - loss: 38823389833.2681 Epoch 121/150 1022/1022 [==============================] - 0s 77us/step - loss: 38822922268.0548 Epoch 122/150 1022/1022 [==============================] - 0s 72us/step - loss: 38822453961.3933 Epoch 123/150 1022/1022 [==============================] - 0s 73us/step - loss: 38821983402.3327 Epoch 124/150 1022/1022 [==============================] - 0s 75us/step - loss: 38821516690.7867 Epoch 125/150 1022/1022 [==============================] - 0s 74us/step - loss: 38821048784.9080 Epoch 126/150 1022/1022 [==============================] - 0s 70us/step - loss: 38820574662.8885 Epoch 127/150 1022/1022 [==============================] - 0s 79us/step - loss: 38820103218.0978 Epoch 128/150 1022/1022 [==============================] - 0s 71us/step - loss: 38819632410.5519 Epoch 129/150 1022/1022 [==============================] - 0s 69us/step - loss: 38819166131.8513 Epoch 130/150 1022/1022 [==============================] - 0s 67us/step - loss: 38818695709.0568 Epoch 131/150 1022/1022 [==============================] - 0s 59us/step - loss: 38818223089.9726 Epoch 132/150 1022/1022 [==============================] - 0s 83us/step - loss: 38817755961.6125 Epoch 133/150 1022/1022 [==============================] - 0s 79us/step - loss: 38817283342.5284 Epoch 134/150 1022/1022 [==============================] - 0s 81us/step - loss: 38816815845.4481 Epoch 135/150 1022/1022 [==============================] - 0s 79us/step - loss: 38816346368.5010 Epoch 136/150 1022/1022 [==============================] - 0s 80us/step - loss: 38815873096.1409 Epoch 137/150 1022/1022 [==============================] - 0s 37us/step - loss: 38815397924.0704 Epoch 138/150 1022/1022 [==============================] - 0s 32us/step - loss: 38814930835.7886 Epoch 139/150 1022/1022 [==============================] - 0s 38us/step - loss: 38814460196.5714 Epoch 140/150 1022/1022 [==============================] - 0s 37us/step - loss: 38813990374.9511 Epoch 141/150 1022/1022 [==============================] - 0s 48us/step - loss: 38813525507.0059 Epoch 142/150 1022/1022 [==============================] - 0s 60us/step - loss: 38813056795.5538 Epoch 143/150 1022/1022 [==============================] - 0s 64us/step - loss: 38812583567.2798 Epoch 144/150 1022/1022 [==============================] - 0s 68us/step - loss: 38812111681.6282 Epoch 145/150 1022/1022 [==============================] - 0s 68us/step - loss: 38811647018.0822 Epoch 146/150 1022/1022 [==============================] - 0s 66us/step - loss: 38811179933.8082 Epoch 147/150 1022/1022 [==============================] - 0s 63us/step - loss: 38810707531.1468 Epoch 148/150 1022/1022 [==============================] - 0s 88us/step - loss: 38810237477.0724 Epoch 149/150 1022/1022 [==============================] - 0s 87us/step - loss: 38809761735.8904 Epoch 150/150 1022/1022 [==============================] - 0s 90us/step - loss: 38809294607.5303
<keras.callbacks.callbacks.History at 0x20913acf668>
## Get the predictions on train and validation.
pred_train = model1.predict(train_data_final)
pred_test = model1.predict(test_data_final)
## Get predictions on test data.
test_pred = model1.predict(test_data_combine)
## Display RMSE value for train and vallidation data.
print("Train Error:",sqrt(mean_squared_error(y_train, pred_train)))
print("Test Error:",sqrt(mean_squared_error(y_test, pred_test)))
Train Error: 197358.3163285996 Test Error: 198108.21506572075
############################################## PCA ###########################################################################
## Import PCA model library.
from sklearn.decomposition import PCA
## Instantiate PCA model and fit it.
pca = PCA(n_components=2)
principalComponents = pca.fit_transform(X_train_scaler)
## Get dimensions of train data.
train_data_final.shape
(1022, 277)
## Get dimensions of pca coponnents.
principalComponents.shape
(1022, 2)
## Display principal components.
print(principalComponents)
[[-0.8954133 -2.18631755] [-2.66539918 0.43482128] [-2.09868063 -0.79797737] ... [ 2.48650134 0.28487654] [-1.46838751 -2.10796887] [-1.58717077 -0.00724692]]
## Prepare a dataframe with principal component.
principalDf = pd.DataFrame(data = principalComponents
, columns = ['principal component 1', 'principal component 2'])
## Get first 5 records of principal component.
principalDf.head()
| principal component 1 | principal component 2 | |
|---|---|---|
| 0 | -0.895413 | -2.186318 |
| 1 | -2.665399 | 0.434821 |
| 2 | -2.098681 | -0.797977 |
| 3 | -0.056531 | -0.031794 |
| 4 | 0.845449 | -2.374163 |
## Get varience ratio.
pca.explained_variance_ratio_
array([0.21192618, 0.10387824])
## Get components.
pca.components_
array([[ 0.21954716, 0.13919059, 0.25063762, 0.21109311, 0.22612689,
0.18639512, 0.00789492, 0.1209049 , 0.31321585, 0.31675222,
0.11520559, -0.01314929, 0.32642755, 0.12633061, 0.26510875,
0.22884204, 0.31064052, 0.3227923 , 0.139032 , 0.16942426,
-0.08594562, 0.02502759, 0.0376397 , 0.09271425, 0.0159143 ,
0.01729143, -0.01169429],
[ 0.12553739, 0.09021614, -0.32818343, -0.20498363, -0.02720514,
-0.13671027, 0.0109314 , 0.03405799, -0.10444001, -0.03336502,
0.39664523, 0.18821754, 0.31354295, 0.41962082, 0.37018061,
-0.28723628, -0.11852569, -0.11385579, -0.05821965, 0.03671219,
0.19723035, -0.02522341, 0.06950059, 0.14015004, 0.09141491,
0.03752889, -0.0379974 ]])
## Get features value.
pca.n_features_
27
## Get the predictions on train and validation data.
train_pca = pca.transform(X_train_scaler)
test_pca = pca.transform(X_test_scaler)
## Display train prediction vallues.
train_pca
array([[-0.89541298, -2.18634211],
[-2.66540047, 0.435051 ],
[-2.09868065, -0.79802406],
...,
[ 2.48650239, 0.28477943],
[-1.46838756, -2.10795479],
[-1.58717065, -0.00720368]])
## Display validation prediction vallues.
test_pca
array([[ 1.15631213e+00, 5.95098354e-01],
[-4.51084517e-01, 3.07914898e+00],
[ 1.70842552e-01, -3.86034495e-01],
[ 2.13042556e+00, 3.09347366e-01],
[-1.92999453e+00, 9.29849697e-01],
[ 3.50776552e+00, -1.06220920e+00],
[ 3.46897325e+00, 1.70532045e+00],
[-2.66206957e+00, -4.68360625e-01],
[-4.79949640e-01, -1.35964165e-02],
[-1.51951014e+00, -6.16007414e-01],
[ 5.75927321e-01, -1.68190468e+00],
[ 1.21260292e+00, 5.76601800e-01],
[-8.75171999e-01, -2.57685380e+00],
[-1.87794288e+00, -1.71568191e-01],
[ 2.61642739e+00, 2.37553244e+00],
[ 4.52825404e-01, -7.90378504e-01],
[ 2.44573639e-02, 1.06509568e+00],
[ 4.23283242e+00, -7.68756629e-01],
[ 1.06689672e+00, -2.00248321e-01],
[ 5.81676000e-01, 1.59294388e+00],
[-8.95318581e-01, -5.63821750e-01],
[ 2.50095525e-01, 1.83548069e+00],
[-2.81000784e+00, -4.06339451e-01],
[-1.96392923e+00, 1.30764016e+00],
[ 9.01045732e-01, -3.34261117e-01],
[ 1.08580841e+00, -1.18578419e+00],
[ 1.69796945e+00, -1.31013720e+00],
[ 6.85113862e-01, 1.64829298e-01],
[-1.64203118e+00, -5.60084235e-01],
[-1.34744183e-01, -8.43595522e-01],
[-9.06787824e-01, -3.06272811e-01],
[ 1.53601651e+00, -1.14929645e+00],
[-7.87317929e-01, -3.41414787e-01],
[ 1.60495797e+00, -1.79014071e+00],
[ 4.15783199e+00, -2.97299661e+00],
[ 1.59888023e+00, -1.53721895e+00],
[ 1.00808006e-01, -1.30452524e+00],
[ 4.44819933e+00, 7.30975737e+00],
[ 6.28605006e-01, -5.52121223e-01],
[-3.24248763e+00, 1.56586965e+00],
[-1.02348712e+00, -1.87929718e+00],
[ 1.81907364e+00, -2.16475571e+00],
[ 1.42281589e+00, 6.67058593e-01],
[ 1.09169381e+00, -2.24876358e+00],
[ 1.73841080e+00, -7.87096855e-01],
[-2.85085831e+00, -7.82668820e-01],
[ 4.02556212e+00, -1.22034661e+00],
[ 1.10099290e+00, 1.82921313e-01],
[ 1.94271868e+00, 3.46748783e-02],
[-2.88007069e+00, 1.87603014e+00],
[ 9.72037371e-01, 1.55665603e+00],
[ 5.12336737e-01, -3.36790312e+00],
[-1.01291833e+00, -6.59773884e-01],
[ 1.57729074e-01, -4.19008707e-01],
[-3.19854879e+00, 1.48457420e+00],
[-8.77609394e-01, 8.79886358e-01],
[-2.24898705e+00, -9.51207750e-01],
[ 6.65703150e-01, -9.67564626e-01],
[ 2.37442540e+00, -1.30639424e+00],
[ 4.16994292e+00, -1.17350171e+00],
[-1.51862276e+00, -1.42411530e+00],
[ 3.64927708e+00, 1.32611174e+00],
[-1.77630674e+00, 4.09452419e+00],
[-3.56729582e-01, -1.02624538e+00],
[ 9.34371362e-01, 2.03568283e-01],
[-2.61016086e+00, 3.30155479e+00],
[-3.47932428e+00, 1.32291856e+00],
[ 3.04321680e+00, -1.44706253e+00],
[ 2.13492487e+00, -1.12013206e+00],
[-6.07508391e-01, -5.44293038e-01],
[-1.06487187e+00, -1.85578362e+00],
[ 1.48250602e+00, 1.55901104e+00],
[-8.77895129e-01, -1.84138137e+00],
[ 3.24162650e+00, -1.30819064e+00],
[-5.78243624e-01, 2.48765908e+00],
[-2.37609723e+00, -2.81152823e-01],
[ 2.47215018e-01, -3.34708978e-01],
[-1.20025313e+00, -1.55412327e+00],
[ 1.27032515e+00, 9.64662650e-01],
[-2.09795618e+00, -3.11555082e+00],
[ 3.48267560e+00, -4.77482269e-01],
[ 4.54415875e+00, -1.99007660e+00],
[-4.52901423e+00, -2.99780505e-01],
[-1.98888380e+00, 3.71989732e-02],
[ 3.40050573e+00, -1.16422339e+00],
[-1.13748454e+00, -2.12438847e-01],
[-3.99488094e+00, 2.62272427e-02],
[-2.58950513e+00, 4.43930219e-01],
[-3.43592026e+00, -2.14991395e-02],
[ 3.65538290e+00, -9.50462573e-01],
[-1.75605494e+00, 7.93575681e-01],
[ 5.99155256e-01, 1.47865977e-01],
[ 2.20900346e-01, 1.52401824e-01],
[-1.13667723e+00, 1.84341028e+00],
[-7.47092238e-02, -1.70063443e+00],
[-2.80871136e+00, 1.81525865e+00],
[-5.47329247e-01, 5.02436921e+00],
[-2.74146997e+00, -7.39738863e-01],
[ 1.88755107e+00, 8.35725082e-01],
[-2.40078350e+00, -8.18529469e-01],
[-2.43017051e+00, 2.25149528e+00],
[ 6.50875179e+00, 2.01779985e+00],
[ 4.87777177e+00, 1.66832880e+00],
[-3.20884589e-01, -7.67974751e-01],
[ 1.01775823e+00, -4.14753660e-01],
[-2.46413973e+00, 2.78457874e+00],
[-2.44507526e+00, -8.29428594e-01],
[ 6.02141520e-01, -1.28073036e+00],
[-1.32772651e-01, -7.45960483e-01],
[ 3.71228678e+00, -1.68469832e+00],
[ 3.86735400e+00, 5.84810269e-01],
[ 3.63273998e+00, 2.69770492e+00],
[-3.06440692e+00, 1.18430859e+00],
[ 2.99269328e+00, 4.36819191e-03],
[ 1.00316581e+00, -2.06797775e+00],
[-2.65633465e+00, -2.74943403e+00],
[ 2.75284142e-01, -1.21256911e+00],
[ 5.34729684e+00, 5.34867220e-01],
[-2.69857067e+00, -3.05516168e-01],
[-1.19555198e+00, -1.56606747e+00],
[-5.46867551e-02, 7.10952688e-01],
[ 1.13374286e+00, -1.09704821e+00],
[ 9.31381639e-01, -2.37800815e+00],
[-1.77274064e+00, -1.20908771e+00],
[ 1.07545601e+00, -3.30169745e+00],
[ 3.18039178e+00, -1.72598486e+00],
[-3.20140029e+00, -3.66761268e-01],
[-2.45265046e+00, -1.09522143e+00],
[-6.09623831e-01, -2.16774787e+00],
[ 4.19629362e+00, 2.88844373e-01],
[-8.13024776e-01, 6.74135979e-01],
[ 3.21792102e+00, -2.17157363e+00],
[ 2.75397374e+00, 1.55373289e+00],
[ 3.99965190e-01, 6.03441390e-03],
[ 2.28821382e-01, 2.13210497e+00],
[ 1.10983999e+00, 2.56796529e-01],
[ 3.56531327e+00, 3.79703887e-01],
[-3.59602464e+00, -3.22186393e-01],
[-3.30368864e-01, -1.18964938e+00],
[-2.51885304e+00, -7.99446935e-02],
[-2.18952323e+00, 1.26858541e+00],
[ 2.89395053e+00, 8.59902897e-02],
[ 2.96455241e+00, 8.88974117e-01],
[-8.05398308e-01, -8.89930682e-01],
[ 1.89444828e+00, -1.01423695e+00],
[ 3.89898789e+00, 2.74104382e+00],
[ 3.40115446e+00, -1.46623603e+00],
[-3.27639643e-01, 2.19188614e+00],
[ 9.50302179e-01, -4.28467937e-01],
[-1.42559930e+00, 1.16832970e+00],
[-7.01947444e-02, -5.02355280e-01],
[-1.65565543e+00, -4.30587568e-01],
[ 5.73343019e+00, 1.50101437e+00],
[-3.76911509e-01, -7.69376329e-01],
[ 1.99402868e+00, -1.65433182e+00],
[-4.71912528e-01, -2.69977341e+00],
[ 2.37816740e+00, -9.53778230e-01],
[-1.96728348e+00, 5.49805639e-01],
[-1.23288318e+00, -6.49184944e-01],
[-1.15765620e+00, -1.98417634e+00],
[ 8.24692059e+00, 2.54665248e+00],
[ 1.61960508e+00, 5.77531649e-01],
[ 2.76451774e+00, 1.76829382e+00],
[-1.30906083e+00, 1.29170342e+00],
[ 3.13350683e+00, -1.94894326e+00],
[-2.78432018e+00, 3.03008856e+00],
[-8.45143471e-01, 1.77948228e-01],
[-1.11518258e+00, 8.91119241e-02],
[ 6.62175158e-01, -1.40530716e+00],
[-2.02549035e+00, -2.35229402e-01],
[-1.89636353e+00, -3.83971390e-01],
[-1.04108906e+00, 1.55229552e+00],
[ 5.92004667e-01, 2.18734841e+00],
[ 1.79448034e+00, -3.04515717e+00],
[-7.16705410e-01, -7.84753486e-01],
[ 3.59264995e+00, -4.73598784e-02],
[ 5.96369315e-01, -1.49682352e+00],
[-2.47422575e+00, -1.20878025e-01],
[ 8.28920506e-01, -2.25971769e+00],
[ 3.26595400e+00, -1.37114206e+00],
[-3.61628499e-01, -1.25934606e+00],
[-9.57890683e-01, -2.60073403e+00],
[-1.91596391e+00, -9.13735444e-01],
[ 2.69398113e-01, -1.20988639e+00],
[-2.53892300e-01, 1.89680009e-01],
[-1.78548711e+00, -9.21353820e-01],
[-1.78798239e+00, 2.72383778e+00],
[ 5.51674606e-01, -2.15278069e+00],
[ 2.54655732e-01, -6.17384611e-01],
[-4.51148924e-01, -1.08104641e+00],
[-2.21177015e+00, 7.22333887e-01],
[-1.82278242e+00, 2.05847942e-02],
[ 3.07282200e-01, -3.31547650e-01],
[ 2.48129284e+00, -1.63706683e+00],
[-1.57632369e+00, 9.79735574e-01],
[ 3.69841974e+00, 1.51004555e+00],
[ 1.58917532e-01, -1.82897503e+00],
[-3.18222172e+00, 2.86739755e-01],
[-5.16876840e-01, 2.25759619e-01],
[-3.13378943e+00, -6.06768803e-01],
[ 5.24577500e-01, -4.74949353e-02],
[ 3.70357779e+00, -1.69558235e+00],
[-1.58792535e-01, -2.45263684e-01],
[ 4.36988029e+00, 8.60982568e-01],
[ 1.11561923e+00, 1.52290917e+00],
[-1.67257571e+00, -1.50422926e-01],
[-3.38315000e+00, -1.09305734e+00],
[ 3.85984599e+00, 2.33996049e+00],
[-2.49747556e-01, -9.80131565e-01],
[ 4.98126385e-01, -5.98488159e-01],
[-1.81495567e+00, 2.30506695e+00],
[ 4.61835667e+00, -1.13549778e+00],
[-2.28001942e+00, 2.12275065e-01],
[-1.37065892e+00, 1.16562343e-02],
[-9.88149685e-01, -9.76676474e-01],
[-4.07954919e-01, -2.65545210e+00],
[ 6.03911808e-01, -1.02691513e-01],
[-2.14000392e+00, 1.46799100e+00],
[-1.85682269e+00, -8.08676803e-01],
[-3.39123375e+00, 4.52357203e-01],
[ 3.52922904e-01, -3.32939726e-01],
[-4.33065915e+00, 9.91902488e-01],
[ 2.45318213e+00, 8.56786518e-01],
[-2.23650189e+00, 1.55487422e+00],
[-2.10212555e+00, 2.74391594e-01],
[-2.68177486e+00, 3.42908161e+00],
[-6.50028179e-02, -2.73161535e-01],
[-3.08916399e+00, -8.70875685e-01],
[-5.91746812e-02, -2.76186359e-01],
[-8.60957888e-01, 6.24809298e-01],
[-2.21082142e+00, 2.72732530e+00],
[-1.05770430e+00, -3.66215415e+00],
[ 2.44139362e-01, 2.01278002e+00],
[-2.43978829e+00, 3.07290990e-01],
[-2.10650873e+00, -6.92506424e-01],
[-3.52923973e-01, 1.65786901e-01],
[-1.86436193e-01, 4.79214969e-01],
[-2.02336840e+00, -1.40074254e-01],
[ 4.63034023e-01, -6.64755824e-02],
[-2.88942875e+00, -2.73719486e-02],
[ 2.23561257e+00, -1.15629338e-01],
[ 1.81718838e+00, -6.43812179e-02],
[ 1.52547802e+00, -1.15088479e+00],
[ 1.43279825e+00, 1.43535599e+00],
[-1.55958622e+00, -2.79569080e-01],
[ 2.44724811e+00, 6.25571158e-01],
[-3.36875530e-01, 3.67176523e+00],
[ 3.08693232e+00, -1.78589058e+00],
[-2.22417657e+00, -1.64065183e+00],
[ 2.32781047e+00, -3.53702853e-01],
[ 3.07785389e+00, -1.50385306e+00],
[-2.76609605e+00, 8.66910376e-01],
[ 6.52566474e-01, -3.28009746e-01],
[-3.00131293e+00, 1.45839820e+00],
[ 3.41259025e-02, -1.28182135e-02],
[ 1.18347353e+00, 4.16947196e-01],
[ 2.43177958e+00, -2.28031866e+00],
[ 9.58281856e-01, -2.00809415e+00],
[-9.04530014e-01, -7.49861880e-01],
[-1.81317435e+00, -3.58522055e-01],
[ 1.50239647e+00, -2.57364583e-01],
[ 6.98398027e-01, -2.08482531e+00],
[ 2.43748533e-01, -1.38319311e+00],
[ 7.35087282e-01, 1.85494518e+00],
[ 3.15371584e-01, -8.10563902e-02],
[-5.47791780e-01, 1.85838734e+00],
[ 1.86755031e+00, 1.26761837e+00],
[ 3.95647361e-01, -2.79847886e-01],
[-4.35414283e+00, 6.25103249e-01],
[-2.34452713e+00, -1.52555336e+00],
[-2.06957052e+00, 2.40354676e+00],
[-3.47317949e+00, 4.70451460e-02],
[ 2.11344623e+00, -1.02177579e+00],
[-3.98665127e+00, -1.25659392e+00],
[ 2.94674612e+00, -1.74114614e-01],
[ 2.17666614e-01, 3.28555479e-01],
[ 1.38330917e-01, -2.08089075e+00],
[ 1.79532948e+00, 1.54232738e+00],
[-1.78835779e+00, 1.20101597e+00],
[-2.32887859e+00, -1.55207317e-01],
[-2.38841819e+00, 1.41330139e+00],
[-6.34711976e-01, -4.79627156e-01],
[-2.58020361e+00, 6.25139120e-01],
[-9.37717720e-01, -4.62758415e-02],
[-9.86985845e-02, -1.52024273e+00],
[-2.42590389e+00, 1.27853060e+00],
[ 5.55036983e-01, 5.72680917e-03],
[-2.54401993e+00, 2.20388894e+00],
[-2.51638707e-01, 1.63984745e+00],
[-1.05303240e+00, -1.52146351e-01],
[ 1.38250390e+00, 2.52129640e+00],
[-3.46427081e+00, 9.20146922e-01],
[ 3.71939571e+00, 1.80504033e+00],
[ 5.99819545e-01, -1.85698585e-01],
[ 4.62019370e+00, 2.48683196e+00],
[-3.64409814e-01, 1.25353164e+00],
[-1.69233074e-01, -1.80812471e+00],
[ 1.96238569e+00, -1.16735314e+00],
[-6.29577669e-02, 3.35455389e-01],
[-1.53546477e+00, -2.41605843e-01],
[ 4.30938340e+00, -2.35132954e+00],
[-6.06903926e-01, -4.70377067e-01],
[ 2.21661329e+00, 9.71933747e-02],
[ 3.04282014e+00, 2.18979891e+00],
[ 3.69014412e+00, -1.82136640e+00],
[ 2.70559816e-01, -1.65938255e-01],
[ 4.10394136e+00, -1.54186225e+00],
[ 1.51559284e+00, 3.46707858e-01],
[-8.04695267e-01, -2.24758468e-01],
[-1.85885629e+00, -5.59831066e-01],
[-5.16854087e-01, 4.33392141e-01],
[ 1.13882246e+00, 2.17890716e+00],
[-1.39428244e+00, -1.47420177e-01],
[ 3.21042374e+00, -1.56765882e+00],
[-1.25185050e+00, 3.46940652e+00],
[ 2.24438129e+00, -1.28232515e+00],
[ 3.08746154e+00, -1.39500902e+00],
[-1.57387566e+00, -8.85380807e-01],
[-1.39187114e+00, -3.63792156e+00],
[ 7.92013667e-01, 4.89656935e-01],
[ 3.50848030e+00, -2.00133849e+00],
[ 1.33171351e+00, -6.16441186e-01],
[ 4.95652861e-01, -4.82143161e-01],
[ 6.82196734e-01, 6.40091568e-01],
[ 2.43517043e+00, 2.08166961e+00],
[-1.86339010e-01, 4.21944167e+00],
[-1.26953837e+00, 2.22652917e-01],
[-1.81933604e+00, -1.06303748e+00],
[ 1.00215048e+00, -1.88726277e-01],
[ 1.69471429e+00, -1.65805262e+00],
[-5.96741407e-01, -4.87401971e-01],
[-1.12219184e+00, -3.50511741e-01],
[-1.92214393e+00, 1.98740755e+00],
[ 9.89079473e-01, -2.52507760e+00],
[-4.07818180e+00, -2.53975908e+00],
[ 3.96706220e+00, -7.72726037e-01],
[ 1.81558040e+00, 3.98626479e-01],
[ 2.79866636e+00, -1.57846739e+00],
[ 5.92221987e-03, -4.41519810e-01],
[-9.17712349e-01, -3.95711611e-01],
[-1.62761760e+00, 4.33091103e-02],
[ 5.82477554e+00, -2.83478642e+00],
[ 1.26104474e+00, 2.16293750e+00],
[ 6.88904705e-02, -5.90563937e-01],
[ 3.84324643e+00, 8.76573173e-02],
[-1.52897659e+00, -1.04287007e+00],
[-1.63584542e+00, -1.87040630e+00],
[-1.57286381e+00, -4.07094360e-01],
[ 4.08491227e-01, 1.44140746e+00],
[-1.31881495e+00, -2.21994897e+00],
[ 3.38472766e-02, 4.47504802e+00],
[ 6.91831989e-01, -3.41069016e-01],
[ 2.86225423e+00, -4.35290287e-01],
[-1.40809999e-01, -1.40453530e-01],
[ 2.59674308e+00, -1.79098849e+00],
[-1.47732777e+00, 4.37659131e+00],
[ 6.68796026e-01, -2.50680677e+00],
[-1.01866194e+00, -2.02094156e+00],
[-3.96294287e-01, 1.24949178e+00],
[-1.99802979e+00, -1.52843057e+00],
[-4.08560905e+00, 2.78063063e+00],
[ 5.56780091e-01, -3.42487408e-01],
[ 1.08087535e+00, -2.82120464e-01],
[ 1.04948785e+00, 2.11237749e+00],
[-1.46748145e+00, -1.04283228e+00],
[-7.16890614e-01, -1.18109243e-01],
[-3.29405618e+00, -1.15114457e+00],
[ 2.32363722e-01, -1.31241236e+00],
[ 2.09057768e-01, -2.73220820e+00],
[ 1.76339172e+00, -1.51527341e+00],
[ 1.67366165e+00, 1.17810767e+00],
[-6.56474627e-02, -7.81406720e-01],
[-4.75619311e+00, 2.14778204e-01],
[-3.50282301e+00, -9.20896167e-01],
[-7.29122761e-01, 7.24071637e-01],
[ 3.50887188e-02, -2.84150837e+00],
[-1.70156673e+00, -5.15768821e-01],
[ 1.53363191e+00, 1.08950182e+00],
[ 1.92909456e+00, -1.36937232e+00],
[ 7.98320198e-01, -1.57284625e+00],
[-1.20713101e+00, 2.51696638e+00],
[ 1.59017572e+00, -6.60777853e-01],
[ 7.54503398e-01, -1.89400649e+00],
[-4.24469851e+00, -5.01354797e-01],
[-9.73608007e-02, 2.13295477e+00],
[-1.00284689e+00, -5.86631925e-01],
[ 2.77778877e+00, -7.47727499e-01],
[-1.42602324e+00, -1.04680294e+00],
[-2.41288072e+00, 1.53198863e+00],
[-3.30543565e+00, 3.99574493e-01],
[ 8.09378577e-01, -2.33350508e+00],
[ 7.93951689e-01, 3.91151228e+00],
[-3.80398929e+00, 7.64371361e-01],
[ 2.17068901e-01, -5.57936077e-01],
[ 4.46538066e-01, -4.46126921e-01],
[-3.76520311e-01, -8.54731945e-01],
[ 9.28566985e-01, -2.79685321e+00],
[ 2.12052130e-01, -1.64326002e+00],
[ 2.81943563e+00, -1.45368676e-01],
[-4.62940328e-01, -1.22356111e+00],
[-1.14873835e+00, -1.95487362e+00],
[ 3.17335534e+00, 2.93056829e-01],
[ 2.62183971e+00, -1.60707392e+00],
[-2.31105104e-01, 4.21899933e-01],
[ 2.17332469e-01, -1.01486682e-01],
[-8.96745989e-01, -1.61273816e-01],
[ 1.66338964e+00, -3.28957830e-02],
[ 3.99640621e+00, 3.39454129e+00],
[ 5.64066166e+00, 1.21628484e+00],
[ 5.57799915e-01, 1.26861624e-01],
[-4.57856779e-02, -1.09273522e+00],
[ 2.57489940e+00, -1.40230030e+00],
[-1.21918775e-02, 1.29147714e+00],
[-2.74383615e+00, -8.62573830e-01],
[ 3.91191053e+00, 1.31948002e+00],
[ 8.73179852e-02, -1.11242976e+00],
[-2.20054456e+00, -1.20815707e+00],
[-1.58233779e+00, 9.18079330e-02],
[-2.38355150e+00, -8.34470636e-01],
[ 5.49452416e-02, -2.21502808e+00],
[ 1.97079928e+00, 1.00809263e+00],
[ 1.31037255e+00, 2.23880046e-01],
[-3.98499143e+00, -6.16605614e-01],
[ 1.09099656e+00, 1.53979631e+00],
[ 1.19287725e-01, -4.72687656e-01],
[ 1.49249568e+00, 4.40674660e-01],
[ 1.31982908e+00, 4.72695079e-01],
[ 4.44789138e+00, 1.40907860e+00],
[-1.23409599e+00, 1.45572599e+00],
[-2.64026037e+00, -6.19356619e-01],
[ 2.15796524e+00, 1.43985214e-01],
[ 1.84073279e+00, 1.43395998e+00],
[ 6.06253113e-01, -6.42431247e-01],
[-1.22734384e+00, -1.72533409e+00],
[-2.82398401e+00, 1.54302468e+00],
[-2.35682173e+00, 1.30272353e+00],
[ 2.84583603e+00, 1.32717608e-01],
[ 3.18004639e+00, -1.91290875e-01]])
################################################# Linear Regression ###########################################################
## Import linear regression model library.
from sklearn.linear_model import LinearRegression
## Instantiate regression model and fit a model.
linreg=LinearRegression()
linear_model=linreg.fit(train_data_final,y_train)
## Get the predictions on train and validation data.
pred_train = linear_model.predict(train_data_final)
pred_test = linear_model.predict(test_data_final)
## Get predictions on test data.
test_pred = linear_model.predict(test_data_combine)
## Display RMSE value for train and validation data.
print("Train Error:",sqrt(mean_squared_error(y_train, pred_train)))
print("Test Error:",sqrt(mean_squared_error(y_test, pred_test)))
Train Error: 18987.647924827896 Test Error: 213230686032402.3
#There is an indication given in the result that there might exist a strong multicollinearity in the data.
#Lets use variance inflation factor (VIF) to understand if there exist a multicollinearity and remove those attributes.
## Import VIF library and get VIF vallues for train data.
from statsmodels.stats.outliers_influence import variance_inflation_factor
vif=pd.DataFrame()
vif['Vif']=[variance_inflation_factor(train_data_final.values,i) for i in range(train_data_final.shape[1])]
vif['Variables']=train_data_final.columns.values
C:\Users\nagar\Anaconda3\lib\site-packages\statsmodels\stats\outliers_influence.py:181: RuntimeWarning: divide by zero encountered in double_scalars vif = 1. / (1. - r_squared_i)
## Display VIF values for train data.
vif
| Vif | Variables | |
|---|---|---|
| 0 | 3.427303 | LotFrontage |
| 1 | 3.999449 | LotArea |
| 2 | 19.790053 | YearBuilt |
| 3 | 4.192063 | YearRemodAdd |
| 4 | 3.712573 | MasVnrArea |
| 5 | inf | BsmtFinSF1 |
| 6 | inf | BsmtFinSF2 |
| 7 | inf | BsmtUnfSF |
| 8 | inf | TotalBsmtSF |
| 9 | inf | 1stFlrSF |
| 10 | inf | 2ndFlrSF |
| 11 | inf | LowQualFinSF |
| 12 | inf | GrLivArea |
| 13 | 4.432669 | BedroomAbvGr |
| 14 | 7.904046 | TotRmsAbvGrd |
| 15 | 6.608873 | GarageYrBlt |
| 16 | 9.654663 | GarageCars |
| 17 | 10.397869 | GarageArea |
| 18 | 1.720052 | WoodDeckSF |
| 19 | 2.017527 | OpenPorchSF |
| 20 | 2.086494 | EnclosedPorch |
| 21 | 1.291584 | 3SsnPorch |
| 22 | 1.537126 | ScreenPorch |
| 23 | 4011.826053 | PoolArea |
| 24 | 15.981804 | MiscVal |
| 25 | 1.423591 | MoSold |
| 26 | 1.437997 | YrSold |
| 27 | 12.261775 | MSSubClass_160 |
| 28 | 4.485264 | MSSubClass_180 |
| 29 | 52.742532 | MSSubClass_190 |
| 30 | 148.036504 | MSSubClass_20 |
| 31 | 34.876756 | MSSubClass_30 |
| 32 | 2.750924 | MSSubClass_40 |
| 33 | 25.913360 | MSSubClass_45 |
| 34 | 75.497906 | MSSubClass_50 |
| 35 | 123.866874 | MSSubClass_60 |
| 36 | 37.687851 | MSSubClass_70 |
| 37 | 13.421388 | MSSubClass_75 |
| 38 | 44.791574 | MSSubClass_80 |
| 39 | 13.353473 | MSSubClass_85 |
| 40 | inf | MSSubClass_90 |
| 41 | 17.310744 | MSZoning_FV |
| 42 | 5.422079 | MSZoning_RH |
| 43 | 53.826513 | MSZoning_RL |
| 44 | 36.078449 | MSZoning_RM |
| 45 | 3.215226 | Street_Pave |
| 46 | 3.554404 | Alley_NAA |
| 47 | 3.582523 | Alley_Pave |
| 48 | 1.524013 | LotShape_IR2 |
| 49 | 1.880076 | LotShape_IR3 |
| 50 | 1.903818 | LotShape_Reg |
| 51 | 2.887861 | LandContour_HLS |
| 52 | 2.980395 | LandContour_Low |
| 53 | 4.024932 | LandContour_Lvl |
| 54 | 1.926402 | Utilities_NoSeWa |
| 55 | 1.950570 | LotConfig_CulDSac |
| 56 | 1.559877 | LotConfig_FR2 |
| 57 | 1.567949 | LotConfig_FR3 |
| 58 | 2.052214 | LotConfig_Inside |
| 59 | 2.208504 | LandSlope_Mod |
| 60 | 3.973680 | LandSlope_Sev |
| 61 | 1.413362 | Neighborhood_Blueste |
| 62 | 5.419048 | Neighborhood_BrDale |
| 63 | 10.298785 | Neighborhood_BrkSide |
| 64 | 4.386085 | Neighborhood_ClearCr |
| 65 | 14.200442 | Neighborhood_CollgCr |
| 66 | 8.563209 | Neighborhood_Crawfor |
| 67 | 11.539715 | Neighborhood_Edwards |
| 68 | 8.353482 | Neighborhood_Gilbert |
| 69 | 10.310404 | Neighborhood_IDOTRR |
| 70 | 6.711799 | Neighborhood_MeadowV |
| 71 | 5.606135 | Neighborhood_Mitchel |
| 72 | 22.902358 | Neighborhood_NAmes |
| 73 | 4.257706 | Neighborhood_NPkVill |
| 74 | 9.953334 | Neighborhood_NWAmes |
| 75 | 5.938724 | Neighborhood_NoRidge |
| 76 | 7.855056 | Neighborhood_NridgHt |
| 77 | 20.591891 | Neighborhood_OldTown |
| 78 | 5.794227 | Neighborhood_SWISU |
| 79 | 9.321027 | Neighborhood_Sawyer |
| 80 | 8.322533 | Neighborhood_SawyerW |
| 81 | 13.880332 | Neighborhood_Somerst |
| 82 | 3.772372 | Neighborhood_StoneBr |
| 83 | 5.983023 | Neighborhood_Timber |
| 84 | 2.904420 | Neighborhood_Veenker |
| 85 | 4.768181 | Condition1_Feedr |
| 86 | 7.666565 | Condition1_Norm |
| 87 | 1.818198 | Condition1_PosA |
| 88 | 2.455862 | Condition1_PosN |
| 89 | 2.060942 | Condition1_RRAe |
| 90 | 2.922049 | Condition1_RRAn |
| 91 | 1.300616 | Condition1_RRNe |
| 92 | 2.093322 | Condition1_RRNn |
| 93 | 10.237745 | Condition2_Feedr |
| 94 | 22.657792 | Condition2_Norm |
| 95 | 5.059141 | Condition2_PosA |
| 96 | 5.122666 | Condition2_PosN |
| 97 | inf | Condition2_RRAe |
| 98 | 2.964216 | Condition2_RRAn |
| 99 | 4.922677 | Condition2_RRNn |
| 100 | 40.937998 | BldgType_2fmCon |
| 101 | inf | BldgType_Duplex |
| 102 | 22.616166 | BldgType_Twnhs |
| 103 | 47.396503 | BldgType_TwnhsE |
| 104 | 22.965994 | HouseStyle_1.5Unf |
| 105 | 56.428032 | HouseStyle_1Story |
| 106 | 6.019891 | HouseStyle_2.5Fin |
| 107 | 5.289820 | HouseStyle_2.5Unf |
| 108 | 37.302869 | HouseStyle_2Story |
| 109 | 10.332908 | HouseStyle_SFoyer |
| 110 | 24.742278 | HouseStyle_SLvl |
| 111 | 37.393888 | OverallQual_10 |
| 112 | 8.742462 | OverallQual_2 |
| 113 | 36.255497 | OverallQual_3 |
| 114 | 174.456349 | OverallQual_4 |
| 115 | 461.587063 | OverallQual_5 |
| 116 | 436.827555 | OverallQual_6 |
| 117 | 389.839838 | OverallQual_7 |
| 118 | 245.516190 | OverallQual_8 |
| 119 | 71.998566 | OverallQual_9 |
| 120 | inf | OverallCond_2 |
| 121 | inf | OverallCond_3 |
| 122 | inf | OverallCond_4 |
| 123 | inf | OverallCond_5 |
| 124 | inf | OverallCond_6 |
| 125 | inf | OverallCond_7 |
| 126 | inf | OverallCond_8 |
| 127 | inf | OverallCond_9 |
| 128 | 262.399021 | RoofStyle_Gable |
| 129 | 18.323231 | RoofStyle_Gambrel |
| 130 | 240.734519 | RoofStyle_Hip |
| 131 | 11.340809 | RoofStyle_Mansard |
| 132 | inf | RoofStyle_Shed |
| 133 | 2700.766719 | RoofMatl_CompShg |
| 134 | 174.363933 | RoofMatl_Membran |
| 135 | 173.802572 | RoofMatl_Roll |
| 136 | 852.637461 | RoofMatl_Tar&Grv |
| 137 | 517.627943 | RoofMatl_WdShake |
| 138 | 854.590541 | RoofMatl_WdShngl |
| 139 | 4.876335 | Exterior1st_BrkComm |
| 140 | 61.397688 | Exterior1st_BrkFace |
| 141 | inf | Exterior1st_CBlock |
| 142 | 94.614071 | Exterior1st_CemntBd |
| 143 | 213.882917 | Exterior1st_HdBoard |
| 144 | 231.069581 | Exterior1st_MetalSd |
| 145 | 112.564784 | Exterior1st_Plywood |
| 146 | 4.062381 | Exterior1st_Stone |
| 147 | 34.602848 | Exterior1st_Stucco |
| 148 | 425.218706 | Exterior1st_VinylSd |
| 149 | 210.423161 | Exterior1st_Wd Sdng |
| 150 | 32.938163 | Exterior1st_WdShing |
| 151 | 4.615536 | Exterior2nd_AsphShn |
| 152 | 13.245295 | Exterior2nd_Brk Cmn |
| 153 | 30.295943 | Exterior2nd_BrkFace |
| 154 | inf | Exterior2nd_CBlock |
| 155 | 91.359437 | Exterior2nd_CmentBd |
| 156 | 192.666290 | Exterior2nd_HdBoard |
| 157 | 10.701847 | Exterior2nd_ImStucc |
| 158 | 216.099506 | Exterior2nd_MetalSd |
| 159 | 2.861664 | Exterior2nd_Other |
| 160 | 134.825448 | Exterior2nd_Plywood |
| 161 | 4.997426 | Exterior2nd_Stone |
| 162 | 35.572751 | Exterior2nd_Stucco |
| 163 | 399.756520 | Exterior2nd_VinylSd |
| 164 | 194.924924 | Exterior2nd_Wd Sdng |
| 165 | 40.216584 | Exterior2nd_Wd Shng |
| 166 | 29.147382 | MasVnrType_BrkFace |
| 167 | 33.423333 | MasVnrType_None |
| 168 | 11.307175 | MasVnrType_Stone |
| 169 | 1.685112 | MasVnrType_nan |
| 170 | 8.052741 | ExterQual_Fa |
| 171 | 19.876674 | ExterQual_Gd |
| 172 | 25.635366 | ExterQual_TA |
| 173 | 26.442209 | ExterCond_Fa |
| 174 | 125.104408 | ExterCond_Gd |
| 175 | 4.197280 | ExterCond_Po |
| 176 | 148.024295 | ExterCond_TA |
| 177 | 7.677131 | Foundation_CBlock |
| 178 | 9.153874 | Foundation_PConc |
| 179 | 5.887688 | Foundation_Slab |
| 180 | 1.746870 | Foundation_Stone |
| 181 | 1.426486 | Foundation_Wood |
| 182 | 3.673268 | BsmtQual_Fa |
| 183 | 8.423189 | BsmtQual_Gd |
| 184 | inf | BsmtQual_NB |
| 185 | 13.395188 | BsmtQual_TA |
| 186 | 3.721399 | BsmtCond_Gd |
| 187 | inf | BsmtCond_NB |
| 188 | inf | BsmtCond_Po |
| 189 | 5.514594 | BsmtCond_TA |
| 190 | 2.437357 | BsmtExposure_Gd |
| 191 | 2.133235 | BsmtExposure_Mn |
| 192 | 28.763158 | BsmtExposure_NB |
| 193 | 3.543014 | BsmtExposure_No |
| 194 | 2.271547 | BsmtFinType1_BLQ |
| 195 | 3.914006 | BsmtFinType1_GLQ |
| 196 | 2.075776 | BsmtFinType1_LwQ |
| 197 | inf | BsmtFinType1_NB |
| 198 | 2.289346 | BsmtFinType1_Rec |
| 199 | 5.219077 | BsmtFinType1_Unf |
| 200 | 3.992223 | BsmtFinType2_BLQ |
| 201 | 2.786058 | BsmtFinType2_GLQ |
| 202 | 5.027166 | BsmtFinType2_LwQ |
| 203 | inf | BsmtFinType2_NB |
| 204 | 6.671440 | BsmtFinType2_Rec |
| 205 | 22.516866 | BsmtFinType2_Unf |
| 206 | 2.367540 | Heating_GasW |
| 207 | 6.569004 | Heating_Grav |
| 208 | 1.818154 | Heating_OthW |
| 209 | 2.167037 | Heating_Wall |
| 210 | 2.415494 | HeatingQC_Fa |
| 211 | 1.863640 | HeatingQC_Gd |
| 212 | 1.634062 | HeatingQC_Po |
| 213 | 2.679859 | HeatingQC_TA |
| 214 | 3.092275 | CentralAir_Y |
| 215 | 2.479705 | Electrical_FuseF |
| 216 | 2.340425 | Electrical_FuseP |
| 217 | inf | Electrical_Mix |
| 218 | 2.447257 | Electrical_SBrkr |
| 219 | 1.198662 | Electrical_nan |
| 220 | 3.885402 | KitchenQual_Fa |
| 221 | 11.535085 | KitchenQual_Gd |
| 222 | 14.338955 | KitchenQual_TA |
| 223 | 2.692766 | Functional_Maj2 |
| 224 | 5.099321 | Functional_Min1 |
| 225 | 6.448350 | Functional_Min2 |
| 226 | 3.590562 | Functional_Mod |
| 227 | 13.171473 | Functional_Typ |
| 228 | 3.554932 | FireplaceQu_Fa |
| 229 | 18.241309 | FireplaceQu_Gd |
| 230 | 24.980039 | FireplaceQu_NF |
| 231 | 2.401982 | FireplaceQu_Po |
| 232 | 16.986905 | FireplaceQu_TA |
| 233 | 128.433846 | GarageType_Attchd |
| 234 | 8.687245 | GarageType_Basment |
| 235 | 29.559992 | GarageType_BuiltIn |
| 236 | 6.091227 | GarageType_CarPort |
| 237 | 104.829853 | GarageType_Detchd |
| 238 | inf | GarageType_NG |
| 239 | inf | GarageFinish_NG |
| 240 | 2.444350 | GarageFinish_RFn |
| 241 | 4.572504 | GarageFinish_Unf |
| 242 | inf | GarageQual_Fa |
| 243 | inf | GarageQual_Gd |
| 244 | inf | GarageQual_NG |
| 245 | inf | GarageQual_Po |
| 246 | inf | GarageQual_TA |
| 247 | inf | GarageCond_Fa |
| 248 | inf | GarageCond_Gd |
| 249 | inf | GarageCond_NG |
| 250 | inf | GarageCond_Po |
| 251 | inf | GarageCond_TA |
| 252 | 2.102860 | PavedDrive_P |
| 253 | 2.896349 | PavedDrive_Y |
| 254 | inf | PoolQC_Fa |
| 255 | 204.949954 | PoolQC_Gd |
| 256 | 3606.247680 | PoolQC_NP |
| 257 | 2.836423 | Fence_GdWo |
| 258 | 4.707714 | Fence_MnPrv |
| 259 | 1.340711 | Fence_MnWw |
| 260 | 6.083139 | Fence_NF |
| 261 | inf | MiscFeature_NE |
| 262 | inf | MiscFeature_Shed |
| 263 | inf | MiscFeature_TenC |
| 264 | 1.490160 | SaleType_CWD |
| 265 | 1.310236 | SaleType_Con |
| 266 | 2.142271 | SaleType_ConLD |
| 267 | 1.500495 | SaleType_ConLI |
| 268 | 2.429533 | SaleType_ConLw |
| 269 | 43.538066 | SaleType_New |
| 270 | 1.414532 | SaleType_Oth |
| 271 | 5.973698 | SaleType_WD |
| 272 | 2.085595 | SaleCondition_AdjLand |
| 273 | 2.235964 | SaleCondition_Alloca |
| 274 | 1.549010 | SaleCondition_Family |
| 275 | 3.882058 | SaleCondition_Normal |
| 276 | 40.600436 | SaleCondition_Partial |
############@@@@@@@@@########## Perform Grid Search,Ridge,Lasso ###############################################################
## Import Ridge,lasso model libraires.
from sklearn.linear_model import Ridge, Lasso
## Ridge
## Import Grid search library.
from sklearn.model_selection import GridSearchCV
## Ridge regression is parametric and takes a parameter alpha. The value of alpha determines the reduction in magnitude of coefficients.
## But we also need to check which value of alpha gives best predictions on test data. For this we experiment with several values of alpha and pick the best
## We do this by performing grid search over several values of alpha.
alphas = np.array([1,0.1,0.01,0.001,0.0001,0,1.5,2]) ## Pick the best of these values.
## Create and fit a ridge regression model, testing each alpha.
model_ridge = Ridge()
grid = GridSearchCV(estimator=model_ridge, param_grid=dict(alpha=alphas),cv=10) ## Here the argument cv=10 implies compute error on 10 chucks of data and report average value.
grid.fit(train_data_final,y_train)
print(grid)
GridSearchCV(cv=10, error_score=nan,
estimator=Ridge(alpha=1.0, copy_X=True, fit_intercept=True,
max_iter=None, normalize=False, random_state=None,
solver='auto', tol=0.001),
iid='deprecated', n_jobs=None,
param_grid={'alpha': array([1.0e+00, 1.0e-01, 1.0e-02, 1.0e-03, 1.0e-04, 0.0e+00, 1.5e+00,
2.0e+00])},
pre_dispatch='2*n_jobs', refit=True, return_train_score=False,
scoring=None, verbose=0)
## Print best params.
print(grid.best_score_)
print(grid.best_estimator_.alpha)
0.7608287095091045 2.0
## Instantiate Ridge and fit it.
Ridge_model= Ridge(alpha=2,normalize=False)
Ridge_model.fit(train_data_final,y_train) ## Applying it on the train data, to obtain the coefficients.
Ridge(alpha=2, copy_X=True, fit_intercept=True, max_iter=None, normalize=False,
random_state=None, solver='auto', tol=0.001)
## Get the predictions on train and validation data.
pred_train = Ridge_model.predict(train_data_final)
pred_test = Ridge_model.predict(test_data_final)
## Get predictions on test data.
test_pred = Ridge_model.predict(test_data_combine)
## Display RMSE value for train and validation data.
print("Train Error:",sqrt(mean_squared_error(y_train, pred_train)))
print("Test Error:",sqrt(mean_squared_error(y_test, pred_test)))
Train Error: 23581.943193260486 Test Error: 38789.74028716411
## Lasso
## Get best parameter vlaues by doing grid search.
model_lasso = Lasso()
grid = GridSearchCV(estimator=model_lasso, param_grid=dict(alpha=alphas),cv=10) #Here the argument cv=10 implies compute error on 10 chucks of data and report average value
grid.fit(train_data_final,y_train)
print(grid)
C:\Users\nagar\Anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:476: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 132283552276.12665, tolerance: 586416997.6105675 positive) C:\Users\nagar\Anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:476: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 151663669283.90045, tolerance: 600453299.974236 positive) C:\Users\nagar\Anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:476: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 158740041164.45273, tolerance: 620994499.8141209 positive) C:\Users\nagar\Anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:476: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 123344376302.26239, tolerance: 615809968.3676016 positive) C:\Users\nagar\Anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:476: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 140781385865.65784, tolerance: 568848936.1139773 positive) C:\Users\nagar\Anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:476: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 156195508585.75662, tolerance: 595517884.9173822 positive) C:\Users\nagar\Anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:476: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 158502548646.73212, tolerance: 621049274.6193085 positive) C:\Users\nagar\Anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:476: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 162310583572.46268, tolerance: 599486045.6069565 positive) C:\Users\nagar\Anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:476: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 13517905993.891144, tolerance: 600818404.6004866 positive) C:\Users\nagar\Anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:476: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 165010261997.6953, tolerance: 612724923.8869787 positive) C:\Users\nagar\Anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:476: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 135500461747.66463, tolerance: 586416997.6105675 positive) C:\Users\nagar\Anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:476: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 155076158397.17624, tolerance: 600453299.974236 positive) C:\Users\nagar\Anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:476: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 168542377853.999, tolerance: 620994499.8141209 positive) C:\Users\nagar\Anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:476: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 125737659894.84619, tolerance: 615809968.3676016 positive) C:\Users\nagar\Anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:476: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 147776486107.88174, tolerance: 568848936.1139773 positive) C:\Users\nagar\Anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:476: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 159157662341.10913, tolerance: 595517884.9173822 positive) C:\Users\nagar\Anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:476: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 161191554269.7159, tolerance: 621049274.6193085 positive) C:\Users\nagar\Anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:476: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 165532344219.9775, tolerance: 599486045.6069565 positive) C:\Users\nagar\Anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:476: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 153640045858.81238, tolerance: 600818404.6004866 positive) C:\Users\nagar\Anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:476: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 168866180257.60345, tolerance: 612724923.8869787 positive) C:\Users\nagar\Anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:476: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 136191512314.9615, tolerance: 586416997.6105675 positive) C:\Users\nagar\Anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:476: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 155768556412.25137, tolerance: 600453299.974236 positive) C:\Users\nagar\Anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:476: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 169282554570.844, tolerance: 620994499.8141209 positive) C:\Users\nagar\Anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:476: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 126161299358.95221, tolerance: 615809968.3676016 positive) C:\Users\nagar\Anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:476: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 157142358721.08127, tolerance: 568848936.1139773 positive) C:\Users\nagar\Anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:476: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 159748079904.3531, tolerance: 595517884.9173822 positive) C:\Users\nagar\Anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:476: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 161735022525.51306, tolerance: 621049274.6193085 positive) C:\Users\nagar\Anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:476: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 166155175440.84982, tolerance: 599486045.6069565 positive) C:\Users\nagar\Anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:476: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 167401858539.57175, tolerance: 600818404.6004866 positive) C:\Users\nagar\Anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:476: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 169619094896.12466, tolerance: 612724923.8869787 positive) C:\Users\nagar\Anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:476: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 136266512425.06892, tolerance: 586416997.6105675 positive) C:\Users\nagar\Anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:476: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 155851191765.648, tolerance: 600453299.974236 positive) C:\Users\nagar\Anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:476: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 169358827229.7449, tolerance: 620994499.8141209 positive) C:\Users\nagar\Anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:476: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 126211912716.69667, tolerance: 615809968.3676016 positive) C:\Users\nagar\Anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:476: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 159633335235.47568, tolerance: 568848936.1139773 positive) C:\Users\nagar\Anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:476: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 159817012919.41357, tolerance: 595517884.9173822 positive) C:\Users\nagar\Anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:476: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 161798971794.1304, tolerance: 621049274.6193085 positive) C:\Users\nagar\Anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:476: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 166227498302.9395, tolerance: 599486045.6069565 positive) C:\Users\nagar\Anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:476: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 168726366547.3222, tolerance: 600818404.6004866 positive) C:\Users\nagar\Anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:476: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 169707860342.087, tolerance: 612724923.8869787 positive) C:\Users\nagar\Anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:476: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 136274057865.86304, tolerance: 586416997.6105675 positive) C:\Users\nagar\Anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:476: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 155859610588.30753, tolerance: 600453299.974236 positive) C:\Users\nagar\Anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:476: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 169366475270.95752, tolerance: 620994499.8141209 positive) C:\Users\nagar\Anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:476: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 126217069343.087, tolerance: 615809968.3676016 positive) C:\Users\nagar\Anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:476: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 159878364429.8121, tolerance: 568848936.1139773 positive) C:\Users\nagar\Anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:476: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 159824010387.80765, tolerance: 595517884.9173822 positive) C:\Users\nagar\Anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:476: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 161805476601.50433, tolerance: 621049274.6193085 positive) C:\Users\nagar\Anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:476: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 166234845671.58163, tolerance: 599486045.6069565 positive) C:\Users\nagar\Anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:476: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 168858320100.98807, tolerance: 600818404.6004866 positive) C:\Users\nagar\Anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:476: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 169716893665.36902, tolerance: 612724923.8869787 positive) C:\Users\nagar\Anaconda3\lib\site-packages\sklearn\model_selection\_validation.py:515: UserWarning: With alpha=0, this algorithm does not converge well. You are advised to use the LinearRegression estimator estimator.fit(X_train, y_train, **fit_params) C:\Users\nagar\Anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:476: UserWarning: Coordinate descent with no regularization may lead to unexpected results and is discouraged. positive) C:\Users\nagar\Anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:476: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 136274896570.39502, tolerance: 586416997.6105675 positive) C:\Users\nagar\Anaconda3\lib\site-packages\sklearn\model_selection\_validation.py:515: UserWarning: With alpha=0, this algorithm does not converge well. You are advised to use the LinearRegression estimator estimator.fit(X_train, y_train, **fit_params) C:\Users\nagar\Anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:476: UserWarning: Coordinate descent with no regularization may lead to unexpected results and is discouraged. positive) C:\Users\nagar\Anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:476: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 155860547770.64722, tolerance: 600453299.974236 positive) C:\Users\nagar\Anaconda3\lib\site-packages\sklearn\model_selection\_validation.py:515: UserWarning: With alpha=0, this algorithm does not converge well. You are advised to use the LinearRegression estimator estimator.fit(X_train, y_train, **fit_params) C:\Users\nagar\Anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:476: UserWarning: Coordinate descent with no regularization may lead to unexpected results and is discouraged. positive) C:\Users\nagar\Anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:476: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 169367325036.09665, tolerance: 620994499.8141209 positive) C:\Users\nagar\Anaconda3\lib\site-packages\sklearn\model_selection\_validation.py:515: UserWarning: With alpha=0, this algorithm does not converge well. You are advised to use the LinearRegression estimator estimator.fit(X_train, y_train, **fit_params) C:\Users\nagar\Anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:476: UserWarning: Coordinate descent with no regularization may lead to unexpected results and is discouraged. positive) C:\Users\nagar\Anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:476: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 126217643387.33908, tolerance: 615809968.3676016 positive) C:\Users\nagar\Anaconda3\lib\site-packages\sklearn\model_selection\_validation.py:515: UserWarning: With alpha=0, this algorithm does not converge well. You are advised to use the LinearRegression estimator estimator.fit(X_train, y_train, **fit_params) C:\Users\nagar\Anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:476: UserWarning: Coordinate descent with no regularization may lead to unexpected results and is discouraged. positive) C:\Users\nagar\Anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:476: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 159905544326.53275, tolerance: 568848936.1139773 positive) C:\Users\nagar\Anaconda3\lib\site-packages\sklearn\model_selection\_validation.py:515: UserWarning: With alpha=0, this algorithm does not converge well. You are advised to use the LinearRegression estimator estimator.fit(X_train, y_train, **fit_params) C:\Users\nagar\Anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:476: UserWarning: Coordinate descent with no regularization may lead to unexpected results and is discouraged. positive) C:\Users\nagar\Anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:476: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 159824788664.6952, tolerance: 595517884.9173822 positive) C:\Users\nagar\Anaconda3\lib\site-packages\sklearn\model_selection\_validation.py:515: UserWarning: With alpha=0, this algorithm does not converge well. You are advised to use the LinearRegression estimator estimator.fit(X_train, y_train, **fit_params) C:\Users\nagar\Anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:476: UserWarning: Coordinate descent with no regularization may lead to unexpected results and is discouraged. positive) C:\Users\nagar\Anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:476: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 161806200612.03137, tolerance: 621049274.6193085 positive) C:\Users\nagar\Anaconda3\lib\site-packages\sklearn\model_selection\_validation.py:515: UserWarning: With alpha=0, this algorithm does not converge well. You are advised to use the LinearRegression estimator estimator.fit(X_train, y_train, **fit_params) C:\Users\nagar\Anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:476: UserWarning: Coordinate descent with no regularization may lead to unexpected results and is discouraged. positive) C:\Users\nagar\Anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:476: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 166235663357.0879, tolerance: 599486045.6069565 positive) C:\Users\nagar\Anaconda3\lib\site-packages\sklearn\model_selection\_validation.py:515: UserWarning: With alpha=0, this algorithm does not converge well. You are advised to use the LinearRegression estimator estimator.fit(X_train, y_train, **fit_params) C:\Users\nagar\Anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:476: UserWarning: Coordinate descent with no regularization may lead to unexpected results and is discouraged. positive) C:\Users\nagar\Anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:476: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 168872976003.698, tolerance: 600818404.6004866 positive) C:\Users\nagar\Anaconda3\lib\site-packages\sklearn\model_selection\_validation.py:515: UserWarning: With alpha=0, this algorithm does not converge well. You are advised to use the LinearRegression estimator estimator.fit(X_train, y_train, **fit_params) C:\Users\nagar\Anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:476: UserWarning: Coordinate descent with no regularization may lead to unexpected results and is discouraged. positive) C:\Users\nagar\Anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:476: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 169717899135.38824, tolerance: 612724923.8869787 positive) C:\Users\nagar\Anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:476: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 131917434426.06377, tolerance: 586416997.6105675 positive) C:\Users\nagar\Anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:476: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 151619836495.39517, tolerance: 600453299.974236 positive) C:\Users\nagar\Anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:476: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 149420849916.41656, tolerance: 620994499.8141209 positive) C:\Users\nagar\Anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:476: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 123406506682.24747, tolerance: 615809968.3676016 positive) C:\Users\nagar\Anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:476: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 143205247619.85175, tolerance: 568848936.1139773 positive) C:\Users\nagar\Anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:476: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 156085028299.17822, tolerance: 595517884.9173822 positive) C:\Users\nagar\Anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:476: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 158535821556.43405, tolerance: 621049274.6193085 positive) C:\Users\nagar\Anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:476: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 162112141735.13416, tolerance: 599486045.6069565 positive) C:\Users\nagar\Anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:476: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 2248601315.1430054, tolerance: 600818404.6004866 positive) C:\Users\nagar\Anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:476: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 164730061152.02676, tolerance: 612724923.8869787 positive) C:\Users\nagar\Anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:476: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 132097157478.19542, tolerance: 586416997.6105675 positive) C:\Users\nagar\Anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:476: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 152040330790.3219, tolerance: 600453299.974236 positive) C:\Users\nagar\Anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:476: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 135269185357.28479, tolerance: 620994499.8141209 positive) C:\Users\nagar\Anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:476: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 123909094784.74976, tolerance: 615809968.3676016 positive) C:\Users\nagar\Anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:476: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 145432817081.4537, tolerance: 568848936.1139773 positive) C:\Users\nagar\Anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:476: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 156409232806.73566, tolerance: 595517884.9173822 positive) C:\Users\nagar\Anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:476: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 158987043816.8599, tolerance: 621049274.6193085 positive) C:\Users\nagar\Anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:476: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 162330488914.3558, tolerance: 599486045.6069565 positive) C:\Users\nagar\Anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:476: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 1576156088.5111694, tolerance: 600818404.6004866 positive) C:\Users\nagar\Anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:476: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 164960209951.37195, tolerance: 612724923.8869787 positive)
GridSearchCV(cv=10, error_score=nan,
estimator=Lasso(alpha=1.0, copy_X=True, fit_intercept=True,
max_iter=1000, normalize=False, positive=False,
precompute=False, random_state=None,
selection='cyclic', tol=0.0001, warm_start=False),
iid='deprecated', n_jobs=None,
param_grid={'alpha': array([1.0e+00, 1.0e-01, 1.0e-02, 1.0e-03, 1.0e-04, 0.0e+00, 1.5e+00,
2.0e+00])},
pre_dispatch='2*n_jobs', refit=True, return_train_score=False,
scoring=None, verbose=0)
C:\Users\nagar\Anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:476: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 181238574198.1571, tolerance: 669162392.3363616 positive)
## Display best parameters.
print(grid.best_score_)
print(grid.best_estimator_.alpha)
0.6587272788673987 1.0
## Instantiate Lasso and fit it.
Lasso_model= Lasso(alpha=1.0,normalize=False)
Lasso_model.fit(train_data_final,y_train) ## Applying it on the train data, to obtain the coefficients.
C:\Users\nagar\Anaconda3\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:476: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 181238574198.1571, tolerance: 669162392.3363616 positive)
Lasso(alpha=1.0, copy_X=True, fit_intercept=True, max_iter=1000,
normalize=False, positive=False, precompute=False, random_state=None,
selection='cyclic', tol=0.0001, warm_start=False)
## Get the predictions on train and validation data.
pred_train = Lasso_model.predict(train_data_final)
pred_test = Lasso_model.predict(test_data_final)
## Get predictions on test data.
test_pred = Lasso_model.predict(test_data_combine)
## Display RMSE value for train and validation data.
print("Train Error:",sqrt(mean_squared_error(y_train, pred_train)))
print("Test Error:",sqrt(mean_squared_error(y_test, pred_test)))
Train Error: 19105.464258594904 Test Error: 260630.3318911821